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REMARKS 

THE INVENTION . 

This invention provides for an improved nucleic acid polymerase having greater 
processivity than the natural enzyme. The improved processivity arises from the fusion of a 
nucleic acid binding protein to the polymerase. 

AMENDMENTS . 

Claim 30 has been amended to recite the specific amino acids that comprise 
Sac7d. This is not new matter as the sequence of Sac7d is knovra. 

Claim 40 has been amended to delete an extra period. 

REJECTIONS . 

Claim objections: Claim 40 was objected due to a typographical error. Claim 40 
is amended to delete the additional period. 

35 U.S.C. §112 2nd paragraph 

Claims 30-42 were rejected as indefinite because the specific sequence of Sac7d 
was not set forth with particularity. Sac7d is a known sequence. As the Examiner noted 
correctly, it is fused to the N terminus of deltaTaq as described in Sequence ID. No. 10. Sac7d is 
amino acids 7 through 71 of this fusion protein. Sac7d is fused to the deltaTaq through a short 
linker of GGGVT. 

As presently amended, claim 30 now recites with specificity the exact sequence 
that represents Sac7d. It is believed that this basis for rejection is fiilly addressed by the 
amendment to claim 30. 
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35 U.S.C. §112 1st paragraph - Description 

The Examiner has rejected claims 15-18 and 22-29 as lacking description. The 
Examiner notes that the specification provides for three species of fusion proteins where the 
DNA binding domain increases processivity of the polymerase domain. The basis for this 
rejection is that the applicant has failed to describe particular structure activity relationships 
[SAR] beyond the full length Sso7d protein. 

Applicant acknowledges the Examiner Corps' recent concern over the description 
requirement in biotechnology applications relating to novel proteins. However, this concern 
should not give rise to confusion between that requirement and the enablement requirement. 
Under modem case law, the description and enablement requirements remain two distinct 
requirements. The description requirement ensures the public that the inventor understood what 
his invention was at the time of filing (possession) and the enablement requirement ensures the 
public that they will be taught how to practice the invention after the patent expires. 

In the instant situation, the description based rejection is supported entirely by an 
enablement concern. The Examiner states that the description rejection is raised because claims 
read on Sso7d domains that are defined by their ability to bind to polyclonal antibodies generated 
against the parent domain Sso7d. According to the Examiner, the pending claims fail to 
establish that the applicant was in possession of the invention because there was no SAR data 
provided to teach those of skill how to make variations of the exemplified species. 

It is difficult to respond to this rejection because it applies reasoning appropriate 
for rejection of claims for lack of enablement with a description-based rejection. The recent 
blending of the reasons underlying description and enablement rejections arose with the Federal 
Circuits' decision in University of California v. Eli Lilly and Co, where the Federal Circuit 
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affirmed the invaUdity of the U.C.'s patent claims to any insulin-encoding gene including claims 
directed to the human insulin gene. 

There were two basic reasons why these claims were ruled invalid. First, the U.C. 
did not actually possess the sequence of the human insulin gene or of any other insulin gene 
other than the rat. Second, the U.C.'s specification did not teach how to find other insulin genes. 

Before the U.C. decision and before the Guidelines were published, the U.C. 
claims would have been properly rejected solely under the enablement requirement. However, 
the blending of the two requirements does have an origin earher than the U.C. v. Lilly case. 
There is CCPA case law that recognized a practical overlap between the two requirements. In 
those early cases, the court recognized that you can't teach (enable) what you don't possess 
(description). But regardless of the more recent blending of the description and enablement 
requirements, the intent of law has not changed. You can't claim what you don't possess and 
you can't claim what you don't teach. U.C.'s discovery of the rat insulin gene in 1979 was the 
first insulin gene found. Indeed, U.C. didn't possess any other insulin gene other than rat and 
didn't teach how to find other genes encoding insulin. They failed both tests. 

Applicant does not disagree with this law. But its application to applicants' facts 
represents an improper extension of that law because the subject invention is a new use for an 
old and well-known family of proteins. Thus, the description rejection is contrary to traditional 
chemical patent practice and invites copiers to routinely engineer around the claims in 
contravention of the purpose of the patent statute. 

In the U.C. patent reciting claims for genes encoding insulin, the heart of the 
invention was a new family of genes that was represented by a single species (rat). As the 
Examiner knows, the closer an element is to the "core of the claim's patentability"; the more 
enablement of that element is required to comply with the enablement requirement. In the U.C. 
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claims, the heart of the invention was a novel genus of genes and only one member was 
described. 

In contrast to the U.C. facts, the DNA binding domains of this invention are not a 
new family of genes. The invention is simply the insightful recognition that processivity of 
nucleic acid polymerases can be improved by the fusion of the polymerase to non-sequence 
specific DNA binding domains. Both parts of that fusion are known. There are three major 
classes of DNA binding proteins identified by the specification. The pending claims are focused 
on one family, i.e., basic chromosomal proteins from hyperthermophilic archaeabacteria. Unlike 
rat insulin in 1979, the DNA binding protein family of the pending claims is a well recognized 
genus of related proteins. Attached to this response is a declaration by Dr. Peter Vander Horn. 
Dr. Vander Horn has presented a Blast search of Sso7d that identified a group of related proteins. 
Any of which could be used in the invention to improve processivity of the polymerase domain. 

Regardless of how you legally characterize the outstanding rejection, the 
Examiner's primary concern is that polyclonal antibodies will recognize muteins (man-made 
amino acid modifications) that are inoperable. And without information regarding structure and 
activity, one would not know how to sort the inoperable from the operable embodiments without 
undue experimentation. In response, applicant urges that the specification need not disclose 
what is already knovra or readily apparent to those of skill. 

In his Rule 132 Declaration, Dr. Vander Horn provides three objective means for 
identifying modifications that retain fiinction of the Archea 7 kDa proteins. The last of these 
means is based on SAR information. First, Dr. Vander Horn explains that the natural variation 
among the members of the family provides a preliminary road map for variations. Second, there 
are the typical conserved substitutions of amino acids that are possible beyond naturally existing 
variation. Finally, Dr. Vander Horn provides the SAR information already reported in the prior 
art that pinpoints the non-critical regions from the fiinctional domains. 
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Finally, applicant would ask the Examiner to take note that the specification 
provides an entire section entitled "ASSAYS TO DETERMINE IMPROVED ACTIVITY FOR 
THE CATALYTIC DOMAINS." It is a long section beginning on page 28 and ending on page 
30. This section invites those of skill to test the modifications suggested in the preceding section 
to ensure that processivity is actually improved. 

To say that applicant has failed to describe the genus because polyclonal 
antibodies will recognize muteins for which no specific guidance has been taught is to ignore the 
fact that this is a large family of known proteins. Indeed this is not an unusual method of 
claiming proteins. Attached to this response are three patents that claim proteins using similar 
claim language (Exhibits 1-3). 

Finally, the Examiner's reasoning is internally inconsistent when viewed in light 
of the other claim elements, i.e., polymerase. The claim element reciting "polymerase" is subject 
to the same arguments regarding which muteins are operable and which are inoperable. 
Ostensibly, the reason the "polymerase" language is not similarly rejected is because the 
structure and function of polymerases are well known. As is demonstrated by the above 
arguments and by Dr. Vander Hom's Rule 132 Declaration, the same is true for the Archea-type 
DNA binding proteins. 

Applicant believes that the above remarks fully address the Examiner's concern 
over the description requirement and request withdrawal of this basis for rejecting the pending 
claims. 

35U.S.C. §112 1st paragraph - enablement 

Claims 15-29 and 30-42 are rejected as lacking enablement. These claims define 
the non-specific DNA binding domain by sequence identity to prototype sequences. In the 
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parent application, now issued as U.S. Pat. No. 6,627,424, has claims reciting a binding domain 
having at least 90% homology to Sso7d. The pending claims 15-29 recite 50%, 75% and 85% 
identity to Sso7d, and claims 30-42 recite similar percent identities for Sac7d. 

The Examiner faults this "percentage based" approach because the claims with 
their recital of percentages below 90% read on muteins that have not been identified and that 
their identification would require undue experimentation. His objective reasoning is that without 
any structure activity relationship [SAR] data, one of skill would be at a loss to know which 
amino acids are critical to function and which are not critical. 

During the prosecution of the applicant's parent application, now issued as U.S. 
Pat. No. 6,627,424, Examiner Hutson was confident that at a 90% sequence identity, routine 
experimentation could identify other muteins in the absence of SAR data. Dr. Vander Horn's 
declaration is intended to provide objective reasons why the percentage recited by the pending 
claims is properly set at 50%. 

Natural variation provides 76% identity . 

According to Dr. Vander Hom, a GenBank search of Sso7d readily identifies at 
least 1 8 known DNA binding proteins that have amino acid identities of between 98-79% In a 
recent search, he identified an endonuclease from the archeon Methanococcus jannashii with a 
subsequence with a 47% identity to Sso7d. 

Clearly, this group of proteins represents an old family tree. The natural genetic 
drift is a road map to novel muteins. As Dr. Vander Horn explains, to limit the claims to a 
percentage above that found within the naturally occurring variants is to ignore that nature has 
provided a road map to muteins. It eviscerates the commercial value of the claimed invention by 
inviting copiers to engineer around the invention using routine mutagenesis. In section 13 of his 
declaration, Dr. Vanderhom has created a hybrid protein combining known natural variations to 
obtain a protein with 16% identity to Sso7d. 
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Use of conservative substitution and SAR data lowers the percent identity to below 60%. 
In addition to the natural variations between family members, any protein chemist 
readily understands that non-naturally occurring but conserved substitutions are possible 
throughout the primary sequences of the prototype proteins. Dr. Vanderhom explains this 
convention at section 9 of his Declaration. Dr. Vanderhom further explains in his declaration at 
section 10 that the SAR of the Archeal protein interaction with DNA had been previously studied 
by workers such as Gao. This information permits a routineer to identify the critical binding 
domains in the proteins and to focus mutations away from these domains. 

Combining all this knowledge permits those of skill to routinely identify species 
of protein that have a sequence identity of less than 60%. An example of such a mutein is 
provided by Dr. Vander Horn in section 14 of his Declaration. 

Having provided objective evidence that the claim limitation to 50% identity to 
Sso7d is a reasonable approximation of the ability of protein chemists to alter the primary 
sequence of the prototype while maintaining biological function, applicants submit that they 
have fully rebutted the prima facie case of non-enablement for claims 15-29 and 30-42. 

Beyond the objective evidence provided above, legal precedent supports the 
Examiner allowing claims of the scope presently pending. Unlike the situation in the U. C. vs. 
Lilly, applicant is merely fusing two known protein families. The law is clear. You don't have 
to enable what is already known. Here is an example of a court decision that is on point with our 
facts and should clarify applicant's position with regard to enablement requirements regarding 
claims reciting old elements. Application of Herschler, 200 USPQ 71 1 (CCPA 1979). 

In Herschler, the applicant had discovered that dimethylsulfoxide (DMSO) was 
useful as a transdermal carrier for physiologically active steroids. The CCPA found that a 
priority application describing a single steroid (dexamethasone 21 -phosphate) supported a claim 
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to the genus of all steroids. The CCPA explained that Herschler's claims were not drawn to a 
novel steroid but to the method of administration of steroids. As long as the class of steroids 
could be expected to be carried across the skin by DMSO, the claim could encompass any 
steroid, known or unknown. Following earlier case law, the CCPA reminded the Patent Office 
that the inventive principle was directed to a method of administration of steroids and that the 
specific steroid exemplified was not the point of patentability. 

Herschler illuminates the instant case. The inventive principle in Herschler was a 
method for transdermal transport of all steroids, not the transport of dexamethasone 21- 
phosphate specifically. Similarly, the inventive principle of the instant invention is improving 
the processivity of polymerases by fusing them with a non-specific DNA binding protein. 

Herschler provides guidance in identifying the inventive principle. There the 

court stated: 

The solicitor urges that the class of steroids is so large 
that a single example in the specification could not describe the 
varied members with their further varied properties. We disagree 
with this contention. Steroids, when considered as drugs, have a 
broad scope of physiological activity. On the other hand, 
steroids, when considered as a class of compounds carried through 
a layer of skin by DMSO, appear on this record to be chemically 
quite similar. (Herschler at 717; Italics added) 

The CCPA is saying that the PTO mistakenly focused its concern regarding enablement on the 
claim element "steroids." Logically following that error, the PTO then argued that all steroids 
were not yet known and therefore any claim embracing the entire genus was not enabled. This 
was an irrelevant truth because the initial premise was in error: the inventive element was not 
steroids; but, their use in combination with a transdermal carrier. 

In a perfectly parallel fashion, the instant invention concerns the fusion of a DNA- 
binding protein to a polymerase to improve processivity. The fact that not all DNA-binding 
proteins are known is an irrelevant truth because you don't need that degree of enablement to 
allow a claim that does not rely on that element for its patentability. One of skill would 
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understand that many DNA binding proteins from Archeons, as a genus, are capable of binding 
DNA nonspecifically. And, if provided with a novel protein, one of skill could easily determine, 
with no undue experimentation, whether or not the novel peptide binds nonspecifically to nucleic 
acid. 

The applicant notes that the Examiner has relied on in re Fisher to support his 
position. Fisher is still good law; but, it is entirely inapplicable to the facts presented here. In 
Fisher, the invention was a hormone, ACTH having 39 amino acids. The rejected claim was 
broad and read on any ACTH protein having the first 24 amino acids of ACTH. 

These were the 24 residues conserved across ACTH from several animals. While 
such a claim may not be a problematic claim today, in 1970 it was not technically possible to 
make ACTH chemically and all the natural species had 39 amino acids. Because there was no 
way to inake a 24 amino acid ACTH, the claim was properly rejected by the CCPA as non- 
enabled. As the court said: 

We have already discussed, with respect to the parent 
application, the lack of teaching of how to obtain other-than-39 
amino acid ACTHs . That discussion is fully applicable to the 
instant application, and we think the board was correct in 
finding insufficient disclosure due to this broad aspect of the 
claims 

In view of today's technology, the rejection of a protein based on a "signature 
sequence" is no longer an issue because of advances in protein chemistry. There was nothing 
inherently wrong with the Fisher claim structure — it was simply written before technology could 
enable it. That is not true in our situation. Following natural variations as a road map and 
applying routine mutagenesis techniques, those of skill can routinely create variations of Sso7d 
and Sac7d that are 50% identical to each other or greater. 
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In consideration of the above remarks, Dr. Vander Horn's 132 Declaration and the 
legal arguments set forth above, applicant respectfully submits that the enablement rejection of 
claims 15-29 and 30-42 is fully rebutted. 



Double Patenting 



The Examiner has rejected the pending claims under the judicially created 
doctrine of obviousness-type double patenting over U.S. Pat. No. 6,627,424. A Terminal 
Disclaimer is submitted with this Response. 



Applicant believes that all the outstanding issues raised by the Examiner have 
been fully addressed and the claims are in condition for allowance. If the Examiner believes a 
telephone conference would expedite prosecution of this application, please telephone the 
undersigned at 415-576-0200. 



Respectfully submitted, 

Kenneth A. Weber 
Reg. No. 3 1,677 

TOWNSEND and TOWNSEND and CREW LLP 

Two Embarcadero Center, 8"' Floor 

San Francisco, California 941 11 -3834 

Tel: 415-576-0200 

Fax:415-576-0300 

KAW:jhd 

Ends.: Rule 132 Declaration 
Terminal Disclaimer 
Petition to Amend Inventorship 

Exhibits 1-3 (Claims of U.S. PatNos. 5,786,210; 6,307,020 Bl and 6,392,024 Bl) 
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( i i ) MOLECULE TYPE: prcccin 

( X i ) SEQUENCE DESCRIPTTON: SEQ ID NO:*: 



Leo Leu Oly lie Cy« Ser Leu Tlir Al« 
10 15 

Tyr lie Val Glu 03y V»l Oly Ser Olu V»l Scr Atp Ly« Arg Tbr Cyi 



Met Arg Leu Leu lie Leu Ala 
I 5 



Tbr Thr Gin Arg Leu Pro Vtl Scr Arg lie Lyi Thr Tyr 

4 0 *5 

11c Thr Olu Gly Ser Leu Arg Al. V.l lie Phe lie Tbr Ly. Arg 

5 0 5 3 

V.l Ct» Ala Atp Pro Olo Ala Tbr Trp Val Arg Aip Val 

70 75 80 

Mc I Aip Arg Lyi Ser Am Thi Arg A » o Am Met Me Olo 



Val Ser Leu 

3 5 



Oly Leo Lyi 
6 5 



Vtl Arg Ser 
Thr Lyi Pro 
Thr Oly 



Thr Oly Thr Ola Oli Scr Tbr Ain Thr Ala V.l Tbr Lei 

t « n 10 5 1 t 0 



What is claimed is* the inabiUty to induce an intraccUular Ca"^ flux in 

1 An isolated nucleic add encoding a mammaUan human THP-l ceUs in an intracellular Ca^ flux 

thyxnoldne of approxinmdy ll OC^ 2. An" oUted nucleic add of daim 1 wherein the nucleic 

unglycosylacd form, wherein said thymolane protein. ^^^^ ^^.^ ^^^^^^^ ^ 2. 

i. specificaUy binds to polydonal antibodies generated ^ ^ ^ isolated nudeic add of daim 1 wherein the nucleic 
against an immunogen selected from the group con- ^^^^ encodes the amino acid sequence of SEQ ID NO. 4. 
sisting of: 4. An isolated nucleic add of claim 1 consisting of the 

a) the polypeptide of SEQ ID NO. 2; and coding sequence selected from the group consisting of SEQ 

b) the polypeptide of SEQ ID NO. 4; and i£> ^O. 1 and 3, 

ii. has the foUowing chemical properties; 5. An isolated nudeic add of claim 1 wherein the nucleic 

a) the ability to induce a dose-dependent chemotactic acid is joined to a recombinant vector. 

response by thymocytes in a ihymokine cdl diemo- 6. An isolated nudeic add of daim 1 wherem the nucleic 

taxis assay ; acid is contained within a ceU able to express the thymokme 

b) the inability to induce a dose-dcpcndent chemotactic ^ encoded by the nucleic add. 
response in human THP-l cells in said thymokine 

cell chemotaxis assay; and ***** 
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ABSTI^CT 



A family of related intracellular skin type antifreeze 
polypeptides and corresponding coding nucleic acids are 
provided. These are the first skin type intracellular antifreeze 
polypeptides and coding nucleic acids ever reported. The 
polypeptides are naturally expressed in the skin of Winter 
Flounder, and skin specific promoters are also provided. The 
polypeptides are used to make cells cold-resistant, and to 
improve the palatability of cold foods and liquids. Cold 
resistant eukaryotes and prokaryotes, including plants, ani- 
mals and bacteria are made using the skin-type intracellular 
antifreeze polypeptides and nucleic acids. 

14 Claims, 20 Drawing Sheets 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 46: 

Xaa Asp Thr Xaa Xaa Lys Xaa 
1 5 



30 



What is claimed is: 

1. An isolated skin-type intracellular antifreeze 
polypeptide, wherein 

the polypeptide comprises an N terminal Met-Asp-Ala- 
Pro (SEQ ID N0:1) subsequence; 

the polypeptide comprises an internal Ala-Ala-Thr-Ala- 
Ala-Ala-Ala-Lys-Ala-Ala-Ala (SEQ ID N0:2) subse- 
quence; 

the polypeptide does not comprise a signal sequence; 25 
the polypeptide induces a concentration-dependent 

decrease in the freezing point of an aqueous solution; 

and, 

conservative modifications thereof. 

2. The isolated polypeptide of claim 1, wherein the 
polypeptide has a molecular of about 34O0 Da. 

3. The isolated polypeptide of claim 1, wherein the 
polypeptide has an N terminal Met- Asp- Ala-Pro- Ala (SEQ 
ID N0:9) sequence. 

4. The isolated polypeptide of claim 1, wherein the 
polypeptide is from about 35 to about 55 amino acids in 
length, 

5. llie isolated polypeptide of claim 1, wherein the 
polypeptide comprises the sequence Met -Asp -Ala -Pro- Ala - 
X,-Ala-Ala-Ala-Ala-Thr-Ala-Ala-Ala-Alu-Lys-Ala-Ala- 
Ala-Glu-Ala-Thr-X2-Ala-Ala-Ala-Ala-X2-Ala-Ala-Ala-X3- 
Thr (SEQ ID N0:3); wherein, 

Xj is selected from the group consisting of Arg, Lys, and 
Ala; 45 

X2 is selected from the group consisting of Lys and Ala; 
and, 

X3 is selected from the group consisting of Ala and Asp 
and a bond. 

6. The isolated polypeptide of claim 1, wherein the 50 
polypeptide is selected from the group consisting of sAFPl 
(SEQ ID NO: 16), sAFP2 (SEQ ID N0:18), sAFP3 (SEQ ID 
N():2()), sAFP4 (SEQ ID NO:22), sAFP5 (SEQ ID NO:24), 
sAPP6 (SEQ II) N{):26), sAFP7 (SEQ ID NO:28), sAFP8 
(SEQ ID NO:30), and 11-3 (SEQ ID NO:32). 55 

7. The isolated polypeptide of claim 1, wherein the 
polypeptide binds t pool of subtracted polyclonal antibodies, 
wherein the subtracted polyclonal antibodies are raised 
against the sAFPl (SEQ ID NO: 16) polypeptide and sub- 
tracted within IlPLC-6 polypeptide (SEQ ID NO:39). 



8. The isolated polypeptide of claim 1, wherein the 
isolated polypeptide is a component of an aqueous solution. 

9. The isolated polypeptide of claim 1, wherein the 
polypeptide is from about 60% to about 70% helical as 
measured by circular dichroism. 

10. The isolated polypeptide of claim 1, wherein the 
polypeptide is a fusion protein. 

11. The isolated skin-type intracellular antifreeze 
polypeptide of claim 1, which is encoded by a nucleic acid 
molecule, which nucleic acid molecule hybridizes to a skin 
type antifreeze nucleic acid molecule selected from the 
group consisting of sAFPl (SEQ ID NO:15), sAFP2 (SEQ 
ID N0:17), SAFP3 (SEQ ID N0:19), sAFP4 (SEQ ID 
N0:21), SAFP5 (SEQ ID NO:23), sAFP6 (SEQ ID NO:25), 
SAFP7 (SEQ ID NO:27), sAFP8 (SEQ ID NO:29), F2 (SEQ 
ID NO:33) and 11-3 (SEQ ID N0:31) in a northern blot 
under high stringency wash conditions of 0.0 15M NaCl at 
72* C, wherein the nucleic acid molecule does not hybridize 
to SEQ ID N0:41 under high stringency wash conditions of 
O.OlSNaCl at 72° C. 

12. The isolated polypeptide of claim 11, wherein the 
polypeptide is selected from the group consisting of sAFPl 
(SEQ ID N0:16), sAFP2 (SEQ ID N0:18), sAFP3 (SEQ ID 
NO:20), SAFP4 (SEQ ID NO:22), sAFP5 (SEQ ID N0:24), 
SAFP6 (SEQ ID NO:26), sAFP7 (SEQ ID NO;28), sAFP8 
(SEQ ID NO:30), and 11-3 (SEQ ID NO:32). 

13. A method of making an aqueous composition resistant 
to freezing, comprising adding a skin-type antifreeze 
polypeptide to the composition in an amount sufficient to 
change the thermal hysteresis of the composition, wherein 
the skin-type antifreeze polypeptide comprises an N termi- 
nal Met-Asp-Ala-Pro (SEQ ID NO:l) subsequence, and an 
internal Ala-Ala-Thr-Ala-Ala-Ala-Ala-Lys-Ala-Ala-Ala 
(SEQ ID NO: 2) subsequence; and wherein the polypeptide 
does not comprise a signal sequence. 

14. llie method of claim 13, wherein the step of adding 
the skin type anlifree/e peptide is performed in a cell, 
wherein the skin type antifreeze polypeptide is added to the 
cell by transforming the cell with a nucleic acid which 
encodes the skin type antifreeze polypeptide and expressing 
the antifreeze polypeptide in the cell. 
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(57) ABSTRACT 

A novel class of thermal hysteresis (antifreeze) proteins 
(THP) that have up to 100 times the specific activity of fish 
antifreeze proteins has been i.solated and purified from the 
mealworm beetle, Tcncbrio moUtor. Internal sequencing of 
the proteins, leading to cDNA cloning and production of the 
protein in bacteria has confirmed the identity and activity of 
the 8.4 to 10.7 kDa THP. They are novel Thr- and Cys-rich 
proteins composed largely of 12-amino-acid repeats of cys- 
ihr-xaa-scr-xaa-xaa-cys-xaa-xaa-ala-xaa-thr. At a concentra- 
tion of 55 ;Mg/mL, the THP depressed the freezing point 1.6° 
C. below the melting point, and at a concentration of -1 
mg/mL the THP or its variants can account for the 5.5° C. 
of thermal hysteresis found in Tenebrio larvae. The THP 
function by an adsorption-inhibition mechanism and pro- 
duce oval-shaped ice crystals with curved prism faces. 

19 Claims, 7 Drawing Slieets 
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lambda- Zap cDNA library" 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21: 

CACTGCACTG GGGGTGCTGA TTGTACTAGT TGTACAGATG CATGCACTGG TTGTGGAAAT 6 0 

TGTCCAAATG CACATACGTG TACCGATTCC AAAAATTGTG TCAAGGCAGC AACATGTACT 12 0 

GGATCTACAA AATGTAATAC CGCCAGGACG TGTACAAACT CAAAAGACTG TTTTGAAGCC 180 

AAAACATGTA CTGACTCAAC CAACTGTTAC AAAGCTACAG CCTGTACCAA TTCAACAGGA 24 0 

TGTCCCGGAC ATTAAGTTTT TCTATTGTCA ACAATAATAA AACACACTTA CTGTTATCTT 30 0 

AGCTAAAACA TAA 313 



(2) INFORMATION FOR SEQ ID NO: 22: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 amino acids 

(B) TYPE: araino acid 

(C) STRANDEDNESS : 
<D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:22: 

Cys Xaa Xaa Xaa Xaa Xaa Cys Xaa Xaa Cys Xaa Xaa Xaa Cys Xaa Xaa 
15 10 15 

Cys Xaa Xaa Cys 
20 



What is claimed is: ID NO: 12 from nucleotides 105 to 359, wherein the nucleic 

1. An isolated nucleic acid encoding an antifreeze protein, acid encodes a thermal hysteresis protein that lacks a signal 
said protein defined as follows: sequence. 

(i) having a calculated molecular weight of between 7 and 8. The isolated nucleic acid of claim 7, wherein the 
13 kDa; nucleic acid specifically hybridizes under highly stringent 

(ii) having a thermal hysteresis activity greater than 1.5* wash conditions. 

C. at about 1 mg/mL; ^ 9. The isolated nucleic acid of claim 7 which encodes the 

(iii) having a nucleic acid subsequence which encodes the 12 amino acid motif of SEQ ID N0:1. 

N-tcrminal motif set forth in Scq. ID No. 3; 10. An expression vector comprising the isolated nucleic 

(iv) specifically binding to polyclonal antibodies raised acid of claim 1 operably linked to a promoter. 

against antifreeze protein selected from the group con- U. The expression vector of claim 10, wherein the 

sisting of YL-1. YL-2, YL-3 or YL-4: and, 45 promoter is heterologous. 

(v) having at least about 70% amino acid sequence 12. The expression vector of claim 11, wherein the 
identity to an antifreeze protein selected from the group promoter is constitutive. 

consisting of YL-1, YL-2, YL-3 or YL-4. 13. A cell into which, or into an ancestor of which, the 

2. The isolated nucleic acid of claim 1, wherein the isolated nucleic acid of claim 1 has been introduced, 
calculated molecular weight of the encoded protein is 50 14. The cell of claim 13, wherein the isolated nucleic acid 
between about 8 and 12 kDa. sequence is translated into an antifreeze protein which is 

3. The isolated nucleic acid of claim 1, wherein the expressed externally from the cell. 

thermal hysteresis activity is greater than 2** C. at 1 mg/mL. 15. The cell of claim 13, wherein said cell is a fungus. 

4. The isolated nucleic acid of claim 1, wherein the 16. The cell of claim 15, wherein the fungus is a yeast, 
encoded protein includes a subsequence of amino acids 55 17. The cell of claim 16, wherein the yeast is selected 
corresponding to Seq ID No: 4. from the group consisting of Torulopsis hobnil, Saccharo- 

5. The isolated nucleic acid of claim 1 selected from the myces fragilis, Saccharomyces cerevisiae, Saccharomyces 
group consisting of: YL-1 (SEQ ID NO. 10), YL-2 (SEQ ID lactis, and Candida pseudotropicalis. 

NO. 12), YL-3 (SEQ ID NO. 16) and YL-4 (SEQ ID NO. 18. The cell of claim 13, wherein the cell is a bacterium. 

14). 60 19. The cell of claim 18, wherein the bacterium is selected 

6. An isolated nucleic acid which specifically hybridizes from the group consisting oi Streptococcus cremoris, Strep- 
to SEQ ID NO: 2 under stringent wash conditions of tococcus lactis, Streptococcus thermophilus, Leuconostoc 
0.2xSSc at 65** C. for 15 minutes and encodes an antifreeze citrovorum, Leuconostoc mesenteroides^ Lactobacillus 
protein having a thermal hysteresis activity greater than acidophilus, Lactobacillus lactis, Bifidobacterium bifidum, 
about 1.5*' C. at about 1 mg/mL. 65 Bifidobacterium breve, and Bifidobacterium longum. 

7. An isolated nucleic acid which specifically hybridizes 

under stringent wash conditions to the subsequence of SEQ ♦ * ♦ ♦ * 
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Key words: DNA binding protein; Radius of gyration; Amino add mcthyUuon; Microsequence analysis; 

(S. solfataricus) 

DNA-binding proteins have been extracted from the thermoacidophUic archaebacterium Sulfolobus solfa- 
taricus strain PI, grown at 86 ° C and pH 4.5. TTiese proteins, which may have a histone-like function, were 
isolated and purified under standard, non-denaturing conditions, and can be grouped into three moleoilar 
mass dasses of 7, 8 and 10 kDa, We have purified to homogenity the mam 7 kDa protein and deterauned its 
DNA-bindmg affinity by fUter binding assays and electron microscopy. Tbe Stokes radius of gyration 
indicates that the protem occurs as a monomer. The complete ammo-add sequence of this protem contauis 
14 lysme residues out of 63 amino adds and the calculated is 7149. Five of the lysine residues a^e 
nartiaUy monomethylated to varying extents and the methylated residues are located exdi^ively in the 
N-terminal (positions 4 and 6) and the C-terminal (positions 60, 62 and 63) regions only. Hie protem is 
strongly homologous to the 7 kDa protems of Sulfolobus acidocaldarius with the highest homology to protem 
71 Accordingly, the name of this protein from 5. solfatancus was assigned as DNA-binding protem Sso7d. 



hitroduction 

The mode of packing for eukaryotic DNA is 
well estabUshed. A set of small basic proteins, the 
histones, are involved in the formation of compact 
DNA-protein particles which contain the double- 
helical DNA coiled around an octameric histone 
complex [1], In bacteria, the mechanism for fold- 



Abbrcviauons: TPCK, ^-tosylamido-2-phenylethylchloro- 
mcthyl ketone; DABfTC, 4-N.iV'-diinethylaminoa2obcn2enc- 
4'-i$olhiocyanatc; SSC, 0.15 M trisodium citrate/0.015 M 
NaQ (pH 7,0); PMSF. phenylmcthylsulphonyl Huoride; BSA, 
bovine scrum albumin; PTH, phcnylthiohydantoin. 

Correspondence: T. ChoU. Max-Planck-Institut fur Molelculare 
Geactik. Abteilung Wiumann, Ihncstr. 73. D-1000 BerUn 33 
(Oahlcm), Germany. 



ing the long circular DNA molecule into a com- 
pact form is much less clear. Although a number 
of proteins have been implicated for this function 
[2], a precise description of the composition of 
* bacterial chromatin' is not yet available. 

Although the structure and composition of the 
bacterial nucleoids are not very well defined, there 
is compelling evidence that bacterial DNA is 
folded into a compact complex [3,4] through the 
partidpation of at least three proteins [5]. In re- 
cent years, several histone-like DNA-binding pro- 
tdns have been isolated from eubacteria, called 
NSl and NS2, HU, HD or DNA-binding protein 
II. Hieir amino-acid sequences have been de- 
termined and are currently under further investi- 
gation [6-10]. Significant homologies have been 
found between the eubacterial proteins and the 
first protein isolated from the archaebacterium 
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Thermoplasma addophilum (for reference see Ref. 
8). Previously, at least two groups of DNA-bind- 
ing proteins with estimated molecular masses of 9 
kDa and 6 kDa were found in several Stdfolobus 
species [11]. From our results it has become clear 
that Stdfolobus acidocaldarius contains several 
DNA-binding proteins of similar sizes with 
values of 7000, 8000 and 10000 [12.13], of which 
the predominant protein, 7d [14], and three of the 
minor components (proteins 7a, 7b and 7e) have 
been sequenced recently [15]. 

In this paper we present the isolation, char- 
acterization and primary structure determination 
of the predominant 7 kDa protein from Sulfolobus 
solfataricus strain PI and compare its sequence 
with thai of the other known bacterial DNA-bind- 
ing proteins. Our nomenclature for these proteins 
in the 7 kDa class is based on the increased 
basicity of the proteins in the order 7a to 7e due 
to their charge differences [12]. To avoid confu- 
sion, it should be pointed out that the primary 
structure of the dominant 7 kDa protein from S. 
acidocaldarius DSM 1616 has been determined 
[14], but at those times the organism was named 
Sulfolobus solfataricus DSM 1616. Comparison of 
DNA-binding proteins, characterization of ribo- 
somal proteins by two-dimensional gel elec- 
trophoresis and the immunological characteriza- 
tion of RNA-polymerase subunits had demon- 
strated clearly that the strain DSM 1616 is similar 
although not identical to S, acidocaldarius DSM 
639 and different from other S. solfataricus strains 
[13], Therefore, this strain was renamed S. 
acidocaldarius DSM 1616. 

Experimental procedures 

Materials 

Sodium dodecylsulfate (SDS) was obtained 
from Serva (Heidelberg, F.R.G.). TPCK trypsin 
was obtained from Worthington (Freehold, NJ, 
U.S. A.). DABITC was from Fluka (Buchs, 
Switzerland), and recrystaHized from boihng 
acetone. Ovalbumin, chymotrypsinogen A, 
myoglobin, cytochrome c and bovine trypsin 
Inhibitor were from Serva (Heidelberg, F.R.G.). 
The scintillation cocktail was Beckman Ready-Solv 
TM^^ Beckman (Berkeley, CA, U.S.A.). All solu- 



tions used for protein purification contained 0.1 
mM PMSF, 0.1 mM benzamidine hydrochloride 
and 6 mM 2-mercaptoethanol, //'-monomethyl- 
lysine and the other methylated lysine derivatives 
were purchased from Serva and CalBiochem 
(Frankfurt, F.R,G.). Acctonitrile and 2-propanol 
for HPLC solutions were of LiChrosolv grade and 
all other chemicals were of pro analysis grade 
purchased from Merck (Darmstadt, F.R.G.). 

Methods 

S. solfataricus strain PI was obtained from W. 
Zilhg (Munich), and cells were grown at 86**C 
under conditions described in Ref. 12, with the 
addition of 1 g per liter casamlno acids (Difco, 
Detroit, MI, U.SA.) to the mediumu 

Purification of the DNA-binding protein. 5. 
solfataricus cells were suspended in Polymix-Hepes 
buffer [16]. After addition of DNAase I (RNAase 
free), the cells were broken twice in a Gaulin- 
Manton press (General Electric, Fort Wayne, IN, 
U.S.A.) at 72 MPa (9000 Ib/inctf ). CeUular de- 
bris was removed by centrifugation (1.5 h at 
10000 X g) and the salt concentration of the su- 
pernatant was raised to 1 M NH4CI. Ribosomes 
were separated from smaller proteins by centrifu- 
gation overnight at 160000 Xg. The supernatant 
was dialysed against 10 mM phosphate buffer at 
pH 6.0 and applied onto a CM-Sepharose CL-6B 
column (5 X 40 cm). Proteins were eluted with a 
linear NaCl gradient from 0.05 to 0.8 M in 10 mM 
phosphate buffer at pH 6.0 (20 I, flow rate 100 
ml/h), 30 ml fractions were collected and assayed 
for protein content by SDS-polyacrylamide gel 
electrophoresis (SDS-PAGE). Further purification 
was obtained by gel filtration on Sephadex G-50 
superfine in 0.35 M NaCI and additionally by 
ion-exchange chromatography on Fractogel TSK 
CM-650 (S) with a linear NaQ gradient from 0.1 
to 0.5 M. 

Proteins were checked for purity and identified 
by slab gel electrophoresis in the presence of SDS. 

Determination of Stokes radii. Stokes radii of 
gyration, R^, were determined by analytical gd 
filtration on a Sephadex G-50 superfine column 
(1.7 X 190 cm) in 0.35 M Naa/20 mM phosphate 
buffer (pH 7.0). The flow rate was 12 ml/h and 
the absorption at 230 nm was recorded continur 
ously. The distribution coefficient, /cp, was calcu- 



lated from the void volume ((Fq) determined with 
pextran blue (2000)), the total available volume 
((V) determined with benzamidine hydrochloride), 
and the elution volume (KJ. The calibration line 
for Stokes radii was obtained by plotting the 
inverse error function of (1 - k^^) against as 
described by Ackers [17]. The column was 
calibrated using the following proteins as markers: 
ovalbumin (3.0 nm), chymotrypsinogen A (2.2 nm), 
0iyoglobin (1.9 nm), cytochrome c (1.61 nm) and 
bovine trypsin inhibitor (1.45 nm). 

Filter binding assays. The filter binding assay 
described in Ref. 18 was modified according to 
Ref. 13. A fixed amount of ^H-labeled DNA and 
increasing amounts of protein were incubated in 
0.1 X SSC buffer, but containing 0,25 M NaQ, for 
15 min at 37 *C. DNA-protein complexes were 
collected onto Millipore filters (0.45 fim, Milford, 
MA, U.S.A.) which were presoaked for 1 h at 
22° C in 10 mM KCl/1 mM EDTA/5 mM 2- 
mercaptoethanol/50 iig/mi BSA. The complexes 
were washed three times with 3 ml portions of 
0.1 X SSC buffer containmg 0.25 M NaCl and 
quantified by liquid scintillation counting (Beck- 
man LS 7000). The DNA-binding affinity of the 
examined proteins was expressed in percent refer- 
ring to the 100% sample of [^H]DNA without 
protein content. 

Gel-filtration binding experiments, DNA binding 
experiments using size exclusion chromatography 
on a Sephadex G-50 superfine column (2 X 50 cm) 
were carried out as described in Ref. 14. A fixed 
amount of Sulfolobus DNA and protein 7d was 
incubated for 15 min at 67*" C in *polynux* buffer 
[16]. 1 ml of the sample was injected into the 
column and comigration of the protein with DNA 
was established by analysis of the void volume 
peak by SDS gels. 

Electron microscopy studies. The formation of 
DNA-protein complexes and the preparation of 
samples for electron microscopy by adsorption to 
mica was performed as described in Ref. 19. Vari- 
able amounts of protein were incubated with dou- 
ble-stranded plasmid RSF 1010 and single- 
stranded $X 174 DNA in a buffer comprising 10 
mM triethanolamine-HCl/50 mM KCl/2.5 mM 
MgQ 2/2.5 mM 1,4-dithiothreitol (pH 7.5). Com- 
plexes were fixed with 0.2% (v/v) glutaraldehyde, 
adsorbed to mica and stained with 2% (w/v) 
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aqueous uranyl acetate. Rotary shadowing was 
done with platinum-iridium (80 : 20) at an angle of 
about 8**. Electron micrographs were made with a 
Philips electron microscope, model EM 480. 

Enzymatic digestion with trypsin. The protein 
was digested with TPCK-trypsin (enzyme-lo-sub- 
strate ratio. 1 : 50) in 100 mM N-methybnorpho- 
line acetate buffer at pH 8.1 for 2 h at 37° C, with 
gentle stirring. The peptides were separated by 
reversed-phase HPLC (RP-HPLC) on a Vydac Cig 
(201 TPB) column (250 X 4 mm) in dilute aqueous 
trifluoroacetic acid using an acetonitrile gradient. 

Cleavage with CNBr. Protein 7d (1 mg) was 
cleaved with 6 mg CNBr in 70% (v/v) formic acid 
for 48 h in the dark under nitrogen at ambient 
temperature. The peptides obtained were sep- 
arated directly by RP-HPLC on a Vydac C4 (214 
TP54) column (250 X 4 nun) with a gradient of 
2-propanol in aqueous 0.1% trifluoroacetic add, 
or with a Vydac Cig (201 TPB) colmnn (250 X 4 
mm) with an acetonitrile gradient in aqueous Ui- 
fluoroacetic acid. 

Sequence determination. Automatic sequencing 
of the intact protein was done in a liquid phase 
sequencer [20] with on-line detection of the PTH- 
amino acids [21] by isocratic HPLC employing a 
2-propanol HPLC solvent system (22] or in a 
pulsed gas-liquid phase sequencer [23] (AppUed 
Biosystems, model 477 A) with on-line detection of 
the PTH-amino acids by HPLC using a gradient 
system (Applied Biosystems PTH-analyzer, model 
120A). Sequence analysiis of iryptic peptides was 
performed by manual microsequencing employing 
the DABITC/PITC double coupUng method, and 
the amino-acid derivatives were identified by 
two-dimensional thin-layer chromatography 
[24.25]. DABTH-Leu and DABTH-Ile, which 
comigrate on the micro-TLC plates were identified 
by isocratic HPLC [26]. The peptides obtamed 
from cyanogen bromide cleavage which carried 
homoserine residues were sequenced in a solid 
phase sequencer employing the homoserine lac- 
tone attachment procedure [27,28]. 

Amino-acid analysis. Hydrolysis of the protein 
and peptides was performed in 100 ftl 5.7 M HQ 
for 24 h at llO'^C. The amino adds were de- 
termined after precolunm derivatization with o- 
phthaldialdehyde by RP-HPLC separation as de- 
scribed in Ref. 29. 
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Results and Discussion 

The growth of 5. soifataricus strain PI, brea- 
kage of cells and isolation of the DNA-binding 
proteins were performed as described in the Ex- 
perimental procedures. Similar to S. acidocaldarius 
cells [12], three molecular weight classes of DNA- 
binding proteins of 7, 8 and 10 kDa have been 
isolated from S, soifataricus strain PI. The major 
component of the 7 kDa class is the DNA-binding 
protein 7d, according to the nomenclature used 
for the DNA-binding proteins from 5. acidocal- 
darius {13]. 

Fig. la shows the protein separation on CM- 
Sepharose CL-6B. The fractions containing pro- 
tein 7d and an 8 kDa protein are marked. Further 



purification of protein 7d was performed by gd- 
fdtration on Scphadex G-50 and by ion-exchange 
chromatography on CM-Fractogel TSK as 
described in Experimental procedures (chromato- 
grams not shown). Fig. lb shows the purified 
protein 7d from 5. soifataricus PI on SDS-PAGE 
in comparison to 7 kDa DNA-binding proteins 
from S. acidocaldarius. 

Stokes radii of gyration 

The degree of asymmetry and oligoraerisation 
of proteins are easily determined by analytical gd 
filtration [17]. This procedure allows the use of 
low protein concentration in order to avoid 
artefacts such as protein aggregation. The relation 
between the Stokes radius, and the quaternary 




Fig. 1- (a) Sq)arauon of the DNA-binding proteins on CM- 
Scpharosc CL-6B, Pooled fractions for protein 7d and an 8 
kDa protein arc marked. The NaO concentration was in- 
creased from 0.40 M to 0.49 M in phosphate buffer (pH 6.0) 
uithin the marked region, (b) Protein 7d derived from S. 
soifataricus (this paper) in comparison to 7 kDa proteins from 
S. acidocaldarius (this paper and Ref. 15). Lanes 1 and 6 show 
TP 50 marker proteins from $. soifataricus; lane 2. protein 7b 
from S. acidocaldarius; lane 3, protein 7c from S. 
acidocaldarius; lane 4, protein 7d from S. soifataricus; lane 5, 
protein 7d from S, acidocaldarius. 
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TABLE 1 

THE STOKES RADII OF GYRATION OF THE 7 kDa 
nNA-BlNDING PROTEINS FROM S. ACIDO- 
CALDARIVS "AND 5. SOLFATARICUS DETERMINED 

ANALYTICAL GEL RLTRATION (17]. 
TTic fricuocal ratio (///o) is calculated from the ratio of 
and the radius of the equivalent sphere /^oua* 



R, (nm) f/fo 



monomer dimcr leiramcr 



(a) Sac7d 

(b) Sso7d 



1.53 
1.56 



1.20 
1.21 



0.95 
0.96 



0.75 
0.73 



Structure of proteins is the frictional ratio, f/fo. 
which can be calculated from the experimental 
value and the theoretical minimal radius, Rmm» 
for a given molecular weight. Table I shows that 
in 0.35 M NaCl the 7 kDa proteins are monomers 
lilce the 7 kDa proteins from S. acidocaldarim. 
This is also in accordance with results from ^H- 
NMR experiments (data not shown). 

Filter binding assays 

The original procedure [18] for filter binding 
assays used rather low ionic strength buffer (0.1 X 
SSQ, which allows the nonspecific binding of 
basic proteins to nucleic adds by electrostatic 
interactions. In order to avoid this, the NaGl 
concentration of the binding buffer was increased 
to 0.25 M in 0.1 X SSC. It has been shown that at 
this ionic strength, basic proteins Uke lysozyme, 
cytochrome c or £. coli ribosomal proteins do not 
bind to DNA due to their basicity only [13]. Well 
established DNA-binding proteins like HU from 
£. coli and DNA-binding protein II from Bacillus 
stearothermophilus showed with these buffer con- 
ditions a binding capacity of 18% to 20% at a 
protein/DNA ratio of 25. The whole set of 



DNA-binding proteins from S. acidocaldarius 
clearly demoasuated binding capacities in the 
range of 5% to nearly 80% under die same condi- 
tions [12-14]. The filter binding assay of protein 
7d (Table II) resulted in a DNA-binding affinity 
of about 18% binding capacity referring to the 
100% sample of [^H]DNA without protein content 
at a protein/DNA ratio of 25. This value is slightly 
higher than that of the homologous protein from 
S. acidocaldarius, which can be explained by the 
different amount of methylated lysines. 

The results of the size exclusion experiments 
confirm qualitatively those from filter binding as- 
says. If the protein/DNA ratio is increased drasti- 
cally, free protein is fractionated by the Sephadex- 
G50 superfine column after the void volume peak, 
which contained the protein/DNA complex. The 
same results were obtained using either Sulfolobus 
or E. coli DNA. In the latter case, incubation 
temperature was decreased to 37**C. 

Electron microscopy 

Fig. 2 presents the electron micrographs of 
protein 7d in complexed formation with both dou- 
ble- and single-stranded DNA. The formation of 
the protein-DNA complex results in highly con- 
densed DNA-protein clusters. With increased pro- 
tein/DNA ratios, the isolated clusters on the DNA 
merge more and more into a large central pro- 
tein/DNA cluster, surrounded by loops of free 
DNA. A preference for single- or double-stranded 
DNA was not found. Similar structures have been 
observed for the 7 kDa proteins from 5. acidocal- 
darius, which represent a very homogeneous group 
of five DNA-binding proteins [14,15]. All these 
highly similar proteins have been shown to inter- 
act specifically with single- and double-stranded 
DNA. although a sequence specificity has not 
been observed [19]. 



TABLE II 

MILLIPORE FILTER BINDING ASSAYS 

Increasing amounu of protein were incubated with 0,5 Mg ^H-labeled DNA in the presence of 0.25 M NaQ 0.1 X SSC The 
DNA-binding affinity of protein 7d from 5. soifataricus is shown. 100% affinity is equivalent to the total amount of ( HIDNA. 

Protein/DNA rauo (w/w) i 5 10 15 20 25 
DNA-binding affmity (%) 1 6 ^ ^1 ^ 
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Fig. 2. Hcrtron micrographs of nucleoprotcins formed with 
protein 7d. Some complexes fonned with (ss) DNA (*X 174) 
arc marked with arrows. Ousters of bound protein on (ds) 
plasmid DNA RSF 1010, surrounded by free DNA, could be 
observed. 



Amino-acid sequence determination 

The complete amino-add sequence of protein 
Id from the archaebacterium S. solfataricus and 
the strategy employed for the sequence determina- 
tion arc shown in Fig. 3. The amino-add composi- 
tion derived from the sequence is in good agree- 
ment with that obtained from the total hydrolysis 
of the protein (Table III). As derived from the 
amino-acid sequence, protein 7d contains mod- 
ified lysines which were identified as monomethyl- 
ated residues partially modified at positions 4, 6, 
60. 62 and 63 and fully methylated at position 62 
(see below). 

Occurrence of modified amino acids in the protein 

In the PTH-amino add identification system of 
the Uquid [21,22] and gas-liquid phase sequenator 
[23], a new peak was observed in steps 4, 6. 60, 62 
and 63. This modified derivative was identified 
on-line as e-monomethyl-PTH lysine in compari- 
son with an authentic reference. 



SEQ 
TRY 

CB 



SEQ 
TRY 



CB 



SEQ 
TRY 

CB 



S 10 15 20 

Ala-Thr-Vallys*-Phe-Lys4yr-Lys-Gly-Glu-6lu-Lys-Glu-VQl-Asp-Ue-Se 

Ti ^ ^ T2 T3 ^ ^ , T<> , " J5 



25 30 35 40 

Val-Trp-Arg-Val-Gly-Lys-MetMle-Ser-Phe-Thr-Tyr-Asp-Glu-Gly-Gly-^ 

, T6 ~^ ^. T?"* ^. 
7sa"^ ^ -r -7 ^ 

" ^ CB2 



45 50 55 60 

Gly-AlQ-Val-Ser-Glu-Lys-Asp-Ala-Pro-Lys-Glu-Leu-t^-Gln-Met-Leu-Glu-Lys-Gl^ 
Te "* ~ ^ T9 , jjo , ^ Tii 



Fig. 3. Amino-add sequence of DNA-binding protein 7d from S. soifataricus. Sequences of individual peptides and intact protein arc 
indicated as follows: Sequenced automatically using a pulsed gas-liquid phase sequencer [23], or a liquid-phase sequencer 
{20-22}. Manual liquid-phase DABITC/PITC double coupling method [24,25). t>. Solid-phase sequencing after homoscrinc-lac- 
lonc attachmenc to aminopropyl glass (APG) [27,28]. TRY and CB indicate pq)lidcs derived from digestion with trypsin or dcavagc 

with GNBr. Lys* indicates the A^'-monomethylated lysines. 
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Furthcnnore, experiments with lysine deriva- 
tives showed that this unusual amino acid comig- 
rates with the authentic o-phthaldialdehyde de- 
rivative of e-monomethyl lysine in the amino-acid 
analyzer (15]. Fig. 4 shows the HPLC separation 





TABLE III 

AMINOACID ANALYSIS OF THE DNA-BINDING PRO- 
TEIN 7d FROM 5. SOLFATARICUS 

n.d., residues not determined by amino-acid analysis. 

Number of residues derived by amino-add: 



sequence 



analysis ' 



Asp 


3 


2.6 


Asn 






Glu 


7 


9.0 


Gin 


2 




Scr 


3 


2.4 


Gly 


7 


7-6 


Thr 


3 


2.4 


Arg 


2 


2.3 


Ala 


3 


3.0 


Tyr 


2 


1.7 


Trp 


1 


n.d. 


Met 


2 


1.2 


Val 


5 


5,6 


Phe 


2 


1.6 


He 


3 


2.9 


Leu 


3 


3.1 


Lys** 


14 


12.6 


Pro 


1 


n.d. 



* The values given are not corrected for destruction of amino 

acids or incomplete hydrolysis. 
^ Lys refers to the sum of lysine and monomethylated lysine. 

Due to the presence of incompletely modified lysines, the 

value for lysines by amino-add analysis cannot be calculated 

prcdsely. 



of a standard amino-acid mixture plus c-mono- 
methylated lysine after o-phthaldialdehyde deriva- 
tization. The additional peak which migrates be- 
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Fig. 4, (a) Separation of 100 pmol of a reference amino-add 
mixture containing ^'-monomethylated lysine, after ortho- 
phthaldialdchyde precolumn deiivatizaUon, by rcvcrscd-phasc 
HPLQ using a colimm (250 x 4 mm) fdled with Shandon 
Hypersil ODS 5;i material Buffer A was 12.5 mM Na2HP04 
(pH 7,2), and buffer B was 3% iclrahydrofuran in methanol 
(27]. The peak which appears between threonine and arginine 
comigrates with authentic e-monomeihylated lysine (K*). (b) 
The amino-add composition of protein Sso7d after total hy- 
drolysis. The separation of the amino adds was as described in 
Fig. 4a, The characteristic peak for ^^'-monomethyl lysine 
(K* ) appears at the same position in the chromatogram (c) 
The amino-acid composition of the C-terminal peptide (CB 3) 
after add hydrolysis. The separation of the amino adds was as 
described in Fig. 4a. The peak marked with an asterisk shows 
the e-monomethylated lysine residue. 
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tween threonine and arginine derivatives was de- 
termined to be c-monomethyllysine, whereas c-di- 
methyliysine migrated after the arginine derivative 
and E-trimethyllysine before glycine. 

Fig. 4b shows the separation of the amino-add 
derivatives of protein 7d produced after amino 
acid hydrolysis. Between the arginine and 
threonine o-phthaldialdehyde derivatives, the e- 
monomethyllysine of the hydrolysate of the 
DNA-binding protein 7d can be identified. 

Separation of tryptic peptides and N-terminal se- 
quence region 

Fig. 5 demonstrates the separation of the tryptic 
peptides by RP-HPLC with a Vydac Cjg column. 
Some peptides with the same anmio-add composi- 
tion except for the lysine content elute at different 
retention times. This effect is probably caused by 
the different degree of methylation of lysine re- 
sidues. Sequence information and c>-phlhaldialde- 
hyde-amino-acid determination demonstrates that 
the peptides Tlj and TI4 have Lys-4 modified, 
vith the sequence Ala-Thr-Val-Lys* (pos. 1-4, 
see Fig. 3), while peptide TL, contains an un- 
modified lysine residues with the sequence Ala- 
Thr-Val-Lys. Peptide TI3 is a mixture of the 
peptides Tl^ and Tlj. Peptide T2, Phe-Lys* (pos. 
5-6, see Fig. 3) is found in one position only. The 
degree of methylation, derived from the sequence 



of the intact protein and estimated by peak height, 
is approx. 90% for Lys-4 and 83% for Lys-6. 

The appearance of peptide T7 (pos. 28-39), 
which does not possess modified lysines, at two 
different positions may be due to partial oxidation 
of methionine. The degree of modification at Lys- 
60 appears to be the crucial factor for the dution 
of peptide TIO (pos. 52-60) at different positions, . 
Amino-add analysis of this peptide has shown 
that peptides TlOj and TIO2 differ only at Lys-60, 
namely TIO^ contains unmodified lysine, while 
Lys-60 in TIO2 is monomethylated. 

C'terminal peptide regions 

The peptides produced after CNBr cleavage 
were separated by RP-HPLC dther on a Vydac C4 
or Cjg colunm as described in Experimental pro- 
cedures. The C-terminal peptide (CB3) (pos. 
58-63) was isolated by using the Vydac Cjg col- 
umn and the homoserine peptides CBl (pos. 1-28) 
and CB2 (pos. 29-57) by a Vydac C4 column. 
From the sequence determination and amino-add 
analysis (Fig. 4c) of CB3, the following primary 
structxire was derived: 58-Leu-Glu-Lys*-Gln- 
Lys*-Lys*-63. The degree of monomethylation, as 
estimated by peak height, is approx. 90%, 100% 
and 58% for lysine residues 60, 62 and 63, respec- 
tively. The number of lysine residues in the C- 
terminal peptide was substantiated by fast atom 
bombardment mass spectrometry [30]. 




too 200 300 400 

fractions 

Fig. 5. Separation of the 20 nmol peptides derived by tryptic digestion of protein Sso7d by -HPLC. The peptides were 
chromatographed on a Vydac Cjg (201 TPB) column (250x4 mm) in a solvent system of 0.1% trifluoroacetic add/acclonitrilc. 
gradient applied was 100% A for 10 min, 0-50% B in 180 min. 50-100% B in 20 min, 100% B for 5 min and 100-0% B in 5 
Measiirements were made at 220 nm, 0,16 arbitrary units (full scale), at a flow rate of 1.0 ml/min. 
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corresponding Sac7d protein showed no modified 
lysine residues in the C-terminal sequence region. 

Secondary structure predictions 

Information about the secondary structure of 
protein 7d has been predicted based on the 
amino-add sequence. Four different prediction 
methods according to Ref. 31 were used to calcu- 
late the conformational states (Fig. 6). This pro- 
tein possesses a higher amount of a-helical do- 
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gecause of the mcthylation of the lysines found 
here in the S. solfataricus 7d protein (Sso7d), the 
homologous 7d protein derived from 5. acidocal- 
darius (Sac7d) was also examined for lysine mod- 
ifications not previously identified [14]. We rein- 
vestigated the Sac7d protein by liquid phase se- 
quencing and isolation of the C-terminal CNBr 
fragment, and found A^'-monomethylated lysines 
at positions 4 and 6 (approx. 20% and 50%, re- 
spectively). However, in contrast to Sso7d, the 
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Fig- 6. Secoadary stnicture of DNA-binding proteins 7d from 5. acidocaldanus and 5. solfataricus as predicted by four different 

methods. The symbols represent residues in «-helical beta-sheet (a/^). P-tums (nru) and random coil ( ) 

formations. The line Avg summarizes the secondary structure obtained when at least three of the four predictions are in agreement 
The amino-acid sequences of the proteins are shown at the bottom line in the one-letlcr code Sch, method according to Burgess ct aL 
(33]; C&F, Chou and Passman 134]; Nag. Nagano (35); Rob, Robson and Suzuki [361. 
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Fig. 7. Structural homology between the 7d DNA-binding protdn from S. solfataricus and the UN A-binding proteins 7a, 7b, 7e and 
7d from S, acidocaldarius cells. The ahgnmenl scores (SD units) calculated by the program ALIGN (32] using the standard mutation 
data matrix (100 random runs and a break penalty of 20) are: 

7d S. solfataricus - 7a S. acidocaldarius: 30.93. 7d S. solfataricus - 7d 5. acidocaldarius: 32.63. 
7d S. solfataricus - 7b S. acidocaldarius: 29.54. 7d $. solfataricus - 7c S. acidocaldarius: 30.23. 

Gaps are shown as ... . 



mains - about 35% - as compared to other 7 kDa 
DNA-binding proteins from S. acidocaldarius for 
which only about 15% helix content was calcu- 
lated. 

Homology to other DNA-binding proteins 

By sequence comparison, we found a strong 
degree of homology between protein 7d from S. 
solfataricus and the proteins from the 7 kDa group 
from the archaebacterium S, acidocaldarius (Fig. 
7), using the progranune ALIGN [32]. No signifi- 
cant homology between protein 7d from S. solfa- 
taricus and DNA-binding proteins from other 
organisms has been found. 
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The crystal structure of 
the hyperthermophiie 
chromosomal protein Sso7d 
bound to DNA 



Sso7d and Sac7d are two small (-7,000 MJ, but abundant, chro- 
mosomal proteins firom the hyperthermophiUc archaeabacteria 
Sulfohhus solfataricus and S. acidoaddarius respectively. These 
proteins have higji thermal, acid and chemical stability. TTiey 
bind DNA without marked sequence preference and increase 
the T„ of DNA by -40 Sso7d in complex with GTAATTAC 
and GCGT(aj)CGC + GCG AACGC was crystallized in different 

a 

10 20 30 

SS07d MATVK FKYKG KEKEV DISKI KKVWR VGKMI SPTYD 



crystal lattices and the crystal structures were solved at higji res- 
olution. S$o7d binds in die minor groove of DNA and causes a 
single-step sharp kink in DNA (-60*') by the intercalation of the 
hydrophobic side chains of Val 26 and Met 29. Hic intercalation 
sites are different in the two complexes. Observations of this 
novel DNA binding mode in three independent crystal lattices 
indicate that it is not a (unction of crystal packing. 

How do sequence-general DNA binding proteins bind to DNA 
is a fundamental question for understanding genome structure. 
However, few examples of structures of sequence-general DNA 
binding proteins bound to DNA are known. The high thermal, 
acid and chemical stability associated with Sso7d and Sac7d* (Fig. 
Ifl) makes them an attractive system for structural, thermody- 
namic and DNA-binding studies2-^ Sac7d and Sso7d have 
unfolding temperatures of greater than 90 *<] (at pH 7.5, 0.3 M 
KCl) and both are acid stable with T^'s of >60 ^•C at pH 0. The 



Sac7d 



-VK-- -T- 



V 



-TJ *** «0 

bSO/a BGGGK TGRGA VSEKD APKKL IjQMLB KQKK 

P5 a-Helix — t 

Sac7d m-i- -D--A RAKRE KK 




Fig, 1 a, Amino add sequences of recombinant 
Sso7d and Sac7d. b. Ribbon diagram of the 
Sso7d-GCGTCU)CGC + GCGAAC<5C complex. All side 
chains of Sso7d are shown. The four bridging water 
molecules are shown as large purple spheres. DNA is 
colored red for the first two base pair? and green the 
remaining six base pairs, separated tiy the intercalating 
amino acids (/ellow). c Superposition of three Sso7d 
strurtures from the Sso7d-GCGT0U)C6C + GCGAACGC 
complex (yellow), the Sac7<M3CGATCGC complex* 
(green) and the NMR solution structure' (red). 
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Fig. 2 a. Stereoscopic surface drawing of the elec- 
trostatic potential of the Sso7d-<KX5T0U)CGC + 
GCGAACGC complex. The charge distribution of 
Sac7d was calculated in the absence of DNA. 
S$o7d b positively charged <+€). resulting from W 
lysine^, two arginines, seven glutamic adds and 
three aspartic adds on the protein surface. 
However, the complexes are negatively-charged 
(-8) overall due to the additional 14 negative 
DNA phosphate charges. There H no apparent 
correlation between the monomethylation sites 
of lysines (Lys 5 and Lys 7) and the binding inter- 
face. Four bridging waters are found In the space 
between the protein and DNA. 6. c Detailed 
views at the protein-DNA interface of the 
S$o7d-GCGT0U)CGC + GCGAACGC (left) and 
Sso7d-GTAATTAC (right) contplexes. Selected 
side chains of Sso7d (red), three DNA base pairs 
(green) and four bridging water molecules (pur- 
ple) are shown. 



solution structures of Sso7d* and Sac7d^ determined by NMR 
analyses, are similar to each other and they consist of an incom- 
plete five-stranded p-barrel capped by an amphiphilic a-helix 
abutting the p3-P4-P5 strands. 

Both proteins bind to DNA without marked sequence prefer- 
ence and increase the T„ of DNA by -^0 However their 
DNA-binding mode has remained unclear until recently*, 
Baumann etaL proposed that the P3-P4-P5 sheet is the putative 
DNA binding surface'. McAfee et oL^ have shown that Sac7d 
binds to DNA with an average ratio of four base pairs per 
monomer of Sac7d with no cooperativity. Circular dichroism 
data also indicated that Sac7d induces a sequence-dependent 
cooperative structural transition in DNA. Another unusual prop- 
erty is the ribonuclease (RNase) activity associated with Sso7d. 
which has been called ribonuclease However, similar studies 
on Sac7d did not produce conclusive evidence of any RNase activ- 
ity (unpublished results of J.W.S.). 

We recently determined the crystal structures of two 
Sac7d-DNA complexes which revealed an unexpected DNA 
minor groove binding mode of Sac7d with the DNA duplex 
sharply kinked*. Here we present the results of a parallel study on 
the structure determination of two Sso7d-DNA complexes. The 
complexes were crystallized in two new crystal lattices which 
afford us an excellent opportunity to compare the structure and 
DNA binding properties of not only the same protein (Sso7d) in 
different environments, but also different proteins (that is. Sso7d 
versus Sac7d), The structures are also compared with a recent- 
Sso7d-DNA structure by NMR analysis". 

Overall structure of thee mplex 

The crystal structures of two Sso7d-DNA complexes, 
Sso7d-GCGT0"U)CGC + CXX^CXC CU 5-iodo-deoxyuridine) 



and Sso7ct-GTAATTAC have been solved and well-refined at high 
resolution (Table 1 ), All <t>/V angles of the Ramachandran plot and 
other conformational parameters in both complexes M within the 
acceptable r^ions. The Sso7d binding sites in DNA are sharply 
kinked and located at different places in the two complexes: at the 
C2pG3 step in the Sso7d-GCGT(iU)CGC + GCGAACGC com- 
plex (Fig. lb) arid at the A3pA4 step in the Sso7d-CnAATTAC 
complex respectively. The protein covers four bases and signifi- 
candy widens the DNA minor groove. The other end of the DNA 
duplex remains B-DNA-like. These two complexes have different 
crystal packing interactions, mdicating that the observed novel 
DNA binding mode is not a result of crystal packing and is an 
accurate reflection of the preferred protein-DNA interaction. 

The structures of the bound Sso7d in both complexes are very 
similar to each odier with an r.m.s.d, of 0.51 A (using Ca atoms of 
residues 2-60) and arc generally similar to that of the free Sso7d 
determined by 2D-NMR analysis* with an r.m.s.d. of -2.10 A 
(using Ca atoms of residues 2-60). Some differences exist in the 
orientation of the pl-p2 p-hairpin and in the conformations of 
the C-terminal a-helix (Fig. Ic). 

The molecular surfece of Sso7d is irregular with numerous 
ridges and valleys (Fig. 2a), The excellent matching of shapes and 
charges between Sso7d and DNA in the complexes is evident. A 
long groove is visible which is occupied by DNA in the complex. 
There is also a significant crater created by the crossing of the P3- 
P4-P5 triple stranded P-sheet and the Pl-p2 P-hairpin. 

Sso7d has a OB-fold topology*^ with a small hydrophobic core 
of only 11 residues (<25% solvent accessibility). Four glycines 
(Gly 10, 27, 38 and 39) are located in the loop regions. Many 
hydrophobic amino adds arc solvent exposed (>45% solvent 
accessibility). The surface hydrophobic amino acids Trp 24, Val 
26, Met 29, and Ala 45 arc involved in DNA binding contacts. 
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Fig. 3 Detailed local structures 
at the protetn-DNA interface 
of the Sso7d- GCGTCU)CGC + 
GCGAACGC complex. Selected 
side chains of Sso7d are shown. 
6, Schematic diagram summariz- 
ing all the important Sso7d-DNA 
contacts. The filled, open and 
dashed arrows represent direct 
hydrogen bonds/salt bridges, van 
der Waals close contacts, and 
potential hydrogen bonds/salt 
bridges respectively. 



There are two 3,o-tums that allow the protein's main chain to 
change direction abruptly. The C-terminal helix is solvent 
exposed. A notable feature is the triple-stranded P-sheet (P3-P4- 
P5) whose interactions with DNA are summarized in Fig. 3. 

Bound DNA has a sharp kink 

The DNA is severely kinked (Fig, 4) by the bound Sso7d. as in the 
Sac7d-DNA structures*. This type of DNA kink has been observed 
in the complexes of TBP'^ '^ and LEF-l". SRY>« (two HMG-box 
containing proteins) with their cognate specific DNA sequences, 
but different from that from proteins that bend DNA more 
smoothlyi". The induced local DNA deformation is similar among 
different protein-DNA complexes, despite the different protein 
motifs. It should be noted that the -6lo single step kink in the 
Sso7d-DNA and Sac7d-DNA complexes is the lai^est among all 
known structures of protein-DNA complexes. The solution struc- 
ture of die Sso7d-CTAGCGCGCTAG complex has been analyzed 
NMR recendy" and the DNA was found to be bent by 30°. signifi- 
cantly lower than that found in the crystal structures. The differ- 
ence may be the result of the NMR refinement using limited 
number of observed NOE crosspeaks between Sso7d and DNA due 
to die fest exchange between the free and bound DNA/protein. 

The bound DNA has a \'arying degree of heUx unwinding at steps 
surrounding the intercalation sites (-I4<» at C2pG3, at G3pA4 



and '12? at T4piU5). There is also a slight loU (IP) between die 
G3-C14 and A4-T13 base pairs, thus creating a total bend of 72o. 
Many nucleotides surrounding the wedge site adopt the less-com- 
mon C3'-endo (N-type) sugar puckers: C2 (N), G3 (S). T4 (N) and 
•U5 (N) in one strand and Gl5 (S). CI4 (N). A13 (N) and A12 (S) 
in the other strand. The Sso7d-GTAATTAC complex has the same 
pattern. 

The DNA distortion seen in the complex described here most 
likely represents the structural transition observed by McAfee etoL^ 
using CD spearoscopy for the Sac7d system and die large heat 
capacity change upon DNA binding observed by Lundback et aL\ 
Such a structural transition (unwinding and/or bending) is 
induced in DNA by Sac7d which is 'cooperative* in the sense that it 
is necessary to have two proteins bound within a specified distance 
(for example, 5 base pairs in duplex poIy(dA-<iT)) before die tran- 
sition occurs. The inherent resistarice to die transition is apparent- 
ly negligible in short DNA sequences. Our preliminary ID-NMR 
titration of Sac7d to cisplatin-lesioned DNA indicates a tight bind- 
ing between Sac7d and the pre-kinked DNA, supporting die novel 
binding mode observed in the crystal struaures (unpublished 
data). 

Proteln-ONA interface 

The binding of Sso7d to the minor groove of DNA involves a large 





Rg. 4 Stereoscopic view of the interca- 
lation sites. The local structures of the 
two Sso-ONA complexes are superim- 
posed. The DNA octamer is kinked 
61" at the C2pG3 step in the 
S$o7d-<3CGTCU)CGC + GCGAACGG 
complex and S2? at the A3-A4 step in 
the Sso7d-GTAATTAC complex. The 
sharp kink is due to the intercalation of 
Val 26 and Met 29 amino acid side 
chains into DNA base pairs from the 
minor groove direction, widening the 
minor groove at this step. The inser- 
tions of ^-Met 29 and P3-Val 26 amino 
acid side chains are -1.5 A deep. The 
side chain of Met 29 lies close to the 
base pair with the 5-CH, moiety 
wedged between the C14 and G15 
bases. Similarfy the side chain of Val 26 
is wedged between the C2 and G3 
bases, with each of the 6CH, groups 
pointing toward a base. 
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binding suxhcc area of about 20 A x 
20 A (Fig. la). A significant compo- 
nent of the free energy of binding is 
due to non-elearostatic interactions, 
made in large part by Trp 24, Val 26 
and Met 29 (Fig. 4). In addition to the 
obvious role of the p4-Met 29 and p3- 
Val 26 amino adds, the single trypto- 
phan Ip3-Trp 24) plays multiple roles. 
First its bulky ring fills up the space 
between DNA and Sso7d. Second its 
indole NH group forms a specific 
hydrogen bond (2.93 A) to the N3 of 
the G3 base, anchoring G3 in its open 
(unstacked) position. Ala 45 also 
makes a dose van der Waals contaa 
with the deoxyribose of Cl4. suggest- 
ing the requirement of a small 
hydrophobic side chain of alanine at 
position 45. Ser 3 1 receives a hydrogen 
bond (2.87 A) from the N2 amino 
group of the G3 nudeotide. 
Interestingly, in the Sso7d-GTAAT- 
TAC complex Ser 3 1 forms a hydrogen 
bond to the sulfiir atom of Met 29. 

The guanidinium group of Arg 43 
is hydrogen bonded to the 02 atom 
of 'U5. Arg 43 is held in its place with 
the aid of Tyr 8 whose aromatic ring 
is stacked on the deoxyribose of A 13. 
The phenolic OH group of Tyr 8 is linked to the N3 of A13 
through a bridging water. The hydrogen bond between Arg 43 and 
the 02 atom of a thymine appears to be important and may deter- 
mine the polarity of the Sso7d binding mode. The structure of the 
Sso7d-GCGT(*U)CGC + GCGAACGC complex shows that the 
Arg 43 of Sso7d is hydrogen bonded to 'U5 of the TT-strand, not 
to the AA strand. Therefore a combination of the specific interac- 
tion between a guanine base and Ser 31 , and the hydrogen bond 
between Arg 43 and '115-02. may be important in favoring the 
intercalation site at the C2pG3 step in this complex. 

The formation of the complex is accomplished by specific 
hydrogen bonds/salt bridges (Fig. 3). The number of salt bridges 
between the protein and DNA is in excellent agreement with the 
five ionic interactions predicted by the salt back-titrations of the 
Sac7d complex* using the theory of de Haseth et A some- 
what smaller value has been determined by salt-dependent 
isothermal titration calorimetry on the binding between Sso7d 
andpoly(dG-dC)<. 

An important question is how do Sso7d and Sac7d bind to 
DNA in a sequence-general manner. The answer may lie in the 
bridging water molecules found in the buried cavity located 
between protein and DNA (Fig. 2b,c), This cavity permits the 
G-C base pairs to be bound without steric dash due to the addi- 
tional N2 amino groups, thus endowing the protein with a prop- 
erty required for its sequence-general binding to DNA. For 
example, in the Sso7d-GCGGTCGC -f GCGACCGC complex 
(which has a G-C base pair, instead of an A-T base pair, at the 
fourth position in the sequence), we observed fewer intervening 
water molecules with a concomitant movement of DNA base 
pairs (unpublished results). It is interesting to note that bridging 
water molecules play an important role in modulating the 
sequence-general binding of Sac7d and Sso7d by aaing as filler, 
whereas they play an entirely different role as specific linkers 



Table 1 Crystal and refinement data of two Sso7d-DNA complexes 





Sso7d + 


W(J-02 


l-dU-06 


Sso7d + 




GTAATTAC 






GCGT(iU)CGC 
+GCGAACGC 


Crystal data 










a(A} 


47.60 


47.52 


47.78 


46.87 


/)(A} 


50.77 


50.76 


50.91 


49.67 


c(A) 


42.06 


41.97 


42,03 


37.65 


Resolution (A) 


2.0 


2.0 


2.0 


1.7 


# reflections (>1.0 a(F)) 


7,607 


7.499 


7,669 


11,959 


Temperature (**C) 


20 


20 


20 


-150 


Rinef9c 


7.53 


6.37 


7.33 


7.37 


Completeness (%) 


94,1 


92.9 


95.7 


84.3 


Completeness at highest 










shell for >2.0ct(F)(%) 


83.0(2.0-2.1 A) 






90,6 (1.70-1,78 A) 


Wilson B-factor(A^) 


32.6 


29.7 


32.1 


17.8 


Mean overall 










figure of merit 


0.83 








Refinement data 










# reflection (>2.0 a(I)) 


5.682 






9.488 


IWin9/Rir«(10%data) 


0.168/0.268 






0.203/0.283 


R.m.s.d. bond distance (A) 


0.010/0.007 






0.014A).009 


(Sso7dA3NA) 










R.m.s.d. bond angle (") 


1.37/1.20 






1.81/1,34 


(Sso7dfl5NA) 










No. of atoms 










(Sso7d/DNA) 


510/322 






502^22 


No. of waters 


99 






114 



between protein and DNA in defining the sequence specificity in 
the Trp repressor-DNA recognition". 

Biological implication 

The strurtures of the Sso7d-DNA and Sac7d-DNA complexes 
offer new insights into the possible role of several classes of DNA 
binding proteins in transcription regulation. Some of those pro- 
teins, including TBPi* ». SRY^s. LEFP*' and PurR» bind in the 
minor groove of DNA and Idnk the DNA duplex to a different 
degree*'. Additionally we noted a possible structural alignment 
between Sso7d/Sac7d and the cold shock proteins 
CspA/CspB^*-". Both CspA and CspB are related to a class of pro- 
teins called Y-box proteins, which have a wide-spread and highly 
conserved nucleic acid-binding motif occurring from bacteria to 
humans". Therefore this struaural alignment between 
Sso7d/Sac7d and CspA/CspB may be significant in imderstand- 
ing the Y-box proteins. 

The new DNA binding mode of Sso7d/Sac7d may also oflFer a 
clue for understanding the packaging of DNA in archaeabacteria. 
Several models of the polymeric Sso7d-DNA complex with dif- 
ferent protein/DNA ratios can be constructed by using the struc- 
ture of the complex observed in the crystals. Previously we 
presented a model in which the DNA is maximally-loaded with 
Sac7d proteins'. Additional modeling studies showed that if the 
number of base pairs per protein monomer is increased (for 
example, to 10 base pairs per protein), many possibilities for 
DNA condensation may exist (data not shown) 

Our study augments the understanding of chromatin struc- 
ture in achaea. On the one hand, histones^* or histone-like pro- 
teins (for example, HMf)" form nucleosomes. On the other 
hand, Sso7d/Sac7d may bind to DNA in the minor groove and 
form higher ordered structures. Thus two different types of DNA 
compaaion mechanisms may be possible in the Archaea: the 
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mechanism described here with Sac7d/Sso7d which may be rep- 
resentative of the Crcnarchaeota, and a nucleosome-Iike struc- 
ture for the HMf-class of proteins found in the Euryarchaeota^". 

Interestingly, the bacterial HU protein has a different way of 
forming chomatin struaure. The crystal structure of the complex 
between the integration host foctor (IHF) and DNA revealed that 
IHF induces two prominent kinks in the bound DNA, forming a U- 
tum^, by the partial intercalation of a proline from each of the two 
long P-haiipins which wrap around the DNA. The sequence and 
structural homology between IHF and HU surest that HU may 
organize chromatin using a minor-groove binding mode through 
intercalation. 

Methods 

The purified Sso7d protein^ was dialyzed against de-ionized water and 
lyophilized. The complexes were crystallized using the vapor diffusion 
method. The S$o7d-GTAATTAC complex and the two iodo derivatives 
were crystallized from 1 .3 mM Sso7d. 1 3 mM DNA duplex, 2 mM Tris O 
buffer (pH 6.5), 2,6% reG400 solution, equilibrated with 1 5% PEG400 
The Sso7d-(GCGTTCGC 4- GCGAACGQ and iodo complexes were crys- 
tallized under similar conditions except 5% 2-methyI-2.4-pentanediol 
(MPD) solution was used and the solution equilibrated with 20% MPD. 
Data were collected either at room temperature (20 "O or at -1 50 < on 
a Rigaku R-Axis lie image plate area detector system to various resolu- 
tion ranges (Table 1). The crystals of both complexes are in the space 
group P2i2i2i. The data were processed using the software provided by 
Molecular Structure Corporation. 

The phases were determined by the multiple rsomorphous replace- 
ment (MIR) method using two iodo derivatives (denoted as »-dU-02 and 
WU-05 with the iodine located at positions T2 and T5 respectively) for 
the Sso7d-GTAATTAC complex. The figure-of-merit weighted MIR map 
with solvent flattening at 2.5 A resolution dearly revealed both the 
DNA and the Sso7d protein electron density. At that point the refined 
structure of the Sac7d^AATTAC complex* was used to model the 
Sso7d-6TAA7TAC complex into the MIR electron density. The model 
was appropriately corrected against the un-biased map. The structure 
was refined by the simulated annealing (SA) procedure incorporated in 
X-PLOR» using the data with (FJ > 4a{F) in the 6,0-1.9 A range. 
Simulated annealing and individual temperature fartor refinements 
were carried out by X-PLOR. Well-ordered water molecules were locat- 
ed and gradually Included in the model. 

Crystals of the complex between Sso7d and GCGTTCGC + 
GCGAACGC and the iodo-dU derivatives were obtained, tt was found 
that the sequence GCGT(iU)CGC + GCGAACGC produced the best crys- 
tals and a 1.6 A resolution data set was collected at -150 *>C The struc- 
ture of the complex was solved by the molecular replacement method 
using the AMORE package in the CCP4 suite**. Similar SA refinement 
was carried out with a final R-factor (woricing set) of 20.3% using the 
|FoI > 4<i(F) data in the 6.0-1.6 A range. 

Programs MIDAS Plus (University of California at San Francisco) 
and QUANTA (version 4.0, Molecular Simulatioa Burlington, MA) were 
used to examine the elecUon density maps and nwlecular models. The 
electrostatic potential diagram was calculated by GRASP^. DNA force 
field parameters of Parkinson et a/.» were used. All structures have 
been refined by SA and individual B-factor refinement in X-PLOR. 
During the refinement some rebuilding of the model was necessary to 
improve the fitting of the model to the electron density. The crystal 
data and refinement summaries are listed in Table 1. 



Coordinates. The atomic coordinates of the two Sso7d-DNA com- 
plexes have been deposited in the Brookhaven Protein Data Bank 
(accession numbers 1BNZ and 1BF4 respectively). 
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Solution structure and DNA-binding 
properties of a thermostable 
protein fronn the archaeon 
Sulfolobus solfataricus 



Herbert Baumann, Stefan Knapp, Thomas Lundback, Rudolf Ladenstein and 
Torleif Hard 

The archaeon Sulfolobus solfataricus expresses large amounts of a small basic 
protein, Sso7d, which was previously identified as a DNA-binding protein possibly 
involved in compaction of DISIA. We have determined the solution structure of 
Sso7d. The protein consists of a triple-stranded anti-parallel p-sheet onto which an 
orthogonal double-stranded p-sheet is packed. This topology is very similar to that 
found in eukaryotic Src homology-3 (SH3) domains. Sso7d binds strongly (K < 10 
\iM) to double-stranded DNA and protects it from thermal denaturation. In 
addition, we note that e-mono-methylation of lysine side chains of Sso7d is 
governed by cell growth temperatures, suggesting that methylation is related to 
the heat-shock response. 



(!i nt(M toi !!it.il 



NOVUM 
S-l.ll 



hltiiitttnitt', 



C.'orrospondfiM v 
.hoiitd ho ,Hl(tir 
to r.n 



l>\A ill a raiHi*)m coil CDnlonnation occupies a volume 
ilial is almost always nuich larger than ihe cell in which 
the molecules are conlainetl. 'nuis. ihe DNA of all cells 
must he strucUi rally ori»ani/etl in a compact form and 
yel be readily available rt)r Iranscription. In the nucleus 
ol the eukaryoiic cells the genomic DNA is packed by hi- 
stone proteins into nucleosomcs* whiclVin turn form the 
higher-order structures ofchromatin'. The structural or- 
ganization of DNA in prokaryols is somewhat less well 
understood . Archaea and bacteria contain abundant 
small, basic proteins which are believed to be involved in 
packing and unpacking, maintenance and control of the 
genomic I )NA (see refs 2-5 tor reviews) — oncofthe best 
characterised being the HU protein from Escfwrkhin coli 
Some of these proteins arc also clearly evolutionary re- 
lated to eukaryotic histones (ref. 6 and work cited 
therein). Others are believed to have more specialized 
lunctions, such as to bend the DNA at specific sites 

riie thermoacidophilic archaeon Sulfolobus, which can 
be isolated from volcanic hot springs', expresses several 
small, basic proteins. These proteins were first reported 
by Thomni ct uLircW S) and were subsequently isolated, 
characterized and sequenced by Reinhardt and co-work- 
ers ' The basic proteins isolated from Sulfolobus 
ticidoaMn iusam be grouped into three molecular weight 
classes of 7.(K)(), 8.000 and 10.000 M;(7, 8 and 10 kDa)> 
respectively ■ The 7 kDa proteins can be further sepa- 
rated according to their basicity. Sequences are known 
for the major component of the 7 kOa class in S. 



soljtittuictis (Sso7d)'' and the corresponding protein 
(Sac7d) as well as three minor ct>mponents (Sac7a,Sac7b, 
and Sac7e) inS. (a/</(Hv//r/f/n//>''''. The sequences of these 
proteins are compared in 1-ig. \tt. The proteins are all 
very rich in lysine residues — I I residues out of 63 in 
Sso7d are lysines. I.ysirK* residues at the aniino-and 
carboxy termini (residues 4, d. N), hi and (>3 in Sso7d) 
are subjected li)r-nu)no- methylation within the ceil'**'". 

The function of the 7 kDa class of proteins in 
Sulfolobus is n{>i known. I he initial reports emphasize 
their DNA-binding properties. The proteins are small 
basic and abundant, that is 'histone-iike*. bilter-binding 
assays show that Sst)7d binds DNA at phvsiologicai salt 
concentrations and electron micrographs reveal the for- 
mation of compacted nucleoprolein particles with both 
double-stranded (ds) and single-stranded (s>) DNA'". 
The influence of sequence specificity on Sso7d binding 
to dsDNA has not been investigated. The functional sig- 
nificance of e-mono-methylation of lysines or the effect 
of various degrees of methylation on the DNA-binding 
properties are unclear. 

The Sso7/Sac7 class of proteins mav also have other 
functions in addition to DNA binding. For instance, the 
protein contains the sequence GGGKTGRG (Fig. 
which is reminiscent of the 'P-loops' found in several 
classes of ATP- and GTP-binding proteins", and might 
therefore be a phosphate binding site". 

A protein in 5. solfauiricus,\whkh appears to be iden- 
tical to the previously identified Sso7d, has been sug- 
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Ssc7d ATVKFKYKGEEKQVDISKIKKVWRVGKMISFTYDEGGGKfGRGAVSEKDAPKELLOMLEKQ- 

Sac73 VKVKFKYKGEEKEVDTSKIKKVWRVGKMVSFTYDD-NGKTGRGAVSEKDAPKELLOMLARAEE 
Sac 71;. VKVKFKYKGEEKEVDTSKIKKVWRVGKMVSFTYDD-NGKTGRGAVSEKDAFKELLDMLA 
S3c7d VKVKFKYKGEEKEVDTSKIKKVWRVGKMVSFTYDD-NGKTGRGAVGEr^DAK'KEi.LliMLARAEEK 
:':jc7e AKVKFKY?(GEEKEVDTSK I KKVWRVGKMVSFTYDD-NGKTGRGAVSEKUA? KE I.LDKLARAEEK 
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Fig. 1 a. Aligned sequences of proteins of basic 7 kDa, 
proteins from S. Solfataricus (Sso7d) and S. acidocaldarius 
(Sac7a,b,d,e)^'-'^. The numbering refers to 5so7d. Stars 
indicate lysine residues subjected tO£-mono-methylation. 
The putative phosphate/nucleotide binding site in Sso7d 
has been boxed. Residue 13 in Sso7d has been changed 
from Glu'2 to Gin based on NMR data. b. Mass spectra of 
Sso7d from cultures grown at 75^0 and 88''C. The numbers 
indicate calculated masses for the various species. The 
expected mass for the non-methylated species calculated 
from the sequence is 7147.2 au. 



j;csU\I to aci as a riht) nuclease albeit with a rather nar- 
row substrate speciilcity' '.'Hie protein — ealle^l p2 bv l usi 
ct who compare p2 lo Sac7d, but seem to liave been 
unaware of the published Sst>7il sequence' — is reported 
lo be dimeric under tiative condilion>. I bis observation 
is in contrast lo oilier data> which cleark show that Sso7d 
is monomeric (rel. 12, present work). 

The abundance of Sso7d in .S. soZ/i/Zi/r/Vz/^, in combi- 
nation with its relatively small si/.e, .Nolnl>ilii y, thermo- 
stability, and easeol purirKatit)n makes the protein suit- 
able for biophysical analyses and structure determina- 
lit)n. We have initiated a series ol studies to determine 
the struclure ami dynamics of the Sso7/Sac7 class o( pro- 
teins, their nucleic acitl-bindinj; afllnities and specifici- 
lies» as well as pi>ssible nucleotide binilini;/hydrolysis. 
In the presenl work we report on the struclure ol Sso7d 
in solution and prt)videa more detailed characterization 
of its DNA-bindini; properlies. When analy/.int; the 
structure of Sst>7d we maile the intrii^uini; observation 
that this abundant archaeal protein in Tact is structur- 
ally similar to that oi domains involved in sii;nal 
transduction in eukaryote. We also note that the extenl 
f- mono- met hylal ion of lysine residues in Sso7d depends 
on cell culture t;rowth lemperalure, sui;i;cstint; thai the 
methylation is a response to heat shock. 

Purification and initial characterization 

Sso7d was purified from 5. solfninriciiii (Methods); the 
protein el u led in iwo peaks from the Mono S column 
used in the final purification slep. Mass sped romc trie 
analysis of (pooled) material from the two peaks indi- 
cate the presence of six masses ( Fig. 1 /;). Mass differences 
correspond lo sequential substitutions of hydrogen at- 
oms with methyl groups, as a result of the e-mono-me- 
thylation of lysine residues described previously" '-.The 
observation of six peaks with different methylation pat- 
terns is consistent with the notion that five lysine resi- 
dues are subjected to methylation. The mass of the spe- 
cies with the lowest molecular weight corresponds with 
that calculated from the sequence (Fig.lr?). 

Sso7d from the two fractions show NMR chemical 
shift differences of 0.02-0.12 p.p.m. affecting backbone 
resonances of residues 2. 3, 6, 1 1, 12, 16, 17 and 44, but 
connectivities involving these residues observed in 2D 
NOESY spectra are practically identical for material from 
the two fractions. The chemical shift differences are most 
likely caused by electrostatic effects due to methylation 
of one of the lysine residues, because differences in 
chemical composition can be ruled out based on mass 
spectrometry. The presence of two exchanging conform- 
ers can also be ruled out because NOESY spectra re- 
corded on the two fsenarnted^ sneries UnmnlpQ nnH 
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see Methods) do not change within a period of sever 1 
months. - 

The extent of e-mono-methylation of lysine side 
chains varies with bacterial growth conditions so that 
higher growth temperatures lead to more extensive me- 
thylation (Fig.l[;). The physiological relevance of this 
effect is not clear. It is possible that the lysine methyla- 
tion is directly related to the stability of the protein 
and/or the DNA-protein complex and the response of 
the organism to heat shock. The pK^ of the lysine side 
chain is affected very little by methylation^'^and it seems 
less likely that melhylation has a direct affect on DNA- 
binding affinity. 

Sso7d binds strongly to dsDNA 

The equiiibrium binding of Sso7d to various polynucle- 
otides was studied by monitoring changes in the intrin- 
sic tryptophan tluorcsccncc on formation of the com- 
plexes. The tluoresccncc of Trp 23, excited at 290 nm, is 
quenched by 60-90% on binding and the emission spec- 
trum is shifted to longer wavelengths (not shown)The 
results of titrations performed at low salt (buffer D) and 
physiological salt concentration (buffer C) conditions, 
respectively, are shown in I'ig. 2(;,/^ Titration curves for 
four different dsDNA polymers with alternating purine- 
pyrimidine sequences at iow salt, show an observed 
quenching, Q , , which levels out at |^^0.9. There is 
little difference in the apparetU binding afilnity to the 
various dsDNAs at low salt, presumably due to quanti- 
tative binding to all DNAs. The binding curves show 
saturation at an approximate coucenltation ratio of 1:6 
proteiniDNA base pairs (bp), which can be taken as an 
estimate of the lower limit for the Sso7d binding site 
density on DNA. 

i here is a dellnile dilference between the Sso7d bind- 
ing aflinities to various dsi )NA setjiiences at physiologi- 
cal salt concentrations (l-ig. Ih). I hc binding is stron- 
gest l() polyldldC) and poly(dAdU), for which the af- 
linities are approxjmately ei|u.iL The DNA c oncentra- 
tion at half saturation is in this case approximately 8 [iM 
bp. This number corresponds tt) an alTmity constant of 
-0.5-lxlO" (M sites on DNA) ' if one (conservatively) 
assumes that the maximum binding site density is in the 
range protein-.DNA bp. Binding to poly(dGdC) 

is somewhat weaker and binding to poly(dAdT) is about 
3-10 times weaker than that to polv(dAdU) and 
poly(dldC). 

The binding affinities of Sso7d to various alternating 
dsDNA sequences can be rationalized as follows. First, a 

Fig. 2 Analysis of DNA binding by Sso7d. a. Equilibrium titrations of Sso7d with various polynucleotides and 
monodinucleostdes based on fractional fluorescence quenching (Q^J. The titrations are performed at low salt 
concentration (buffer D) as reverse titrations in which the protein concentration is kept constant (2mM). b, Equilibrium 
titrations performed at a higher salt concentration, which is closer to physiological conditions (buffer C) with 1 ^lM 
protein. Symbols in a and b refer to titrations with poly(dGdC) (1), po!y(dAdT) (T), poly(dldC) (•) poly(dAdU) (A), 
poly(dA) (A) poly(dC) (□), poly(rA) {,), poty(rC) (+), dATP (ffi)and dCTP ( a ). The abscissa legends indicate that 
concentrations of double-stranded DNAs are measured in base pairs and concentrations of single-stranded 
polynucleotides and monodinucleosides are measured in bases, c, Thermal denaturation profiles of poly(dldC) in the 
absence and presence of bound Sso7d; no added protein (c), Sso7d added to a concentration corresponding to 1:15 
Sso7d:DNA bp (lj), and Sso7d added to a concentration corresponding to 1:3.6 Sso7d:DNA bp (A). The poly(dldO 
concentration was 12 yM bp. The thermal denaturation experiments were performed at low salt concentration 
conditions (buffer E). 
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Fig. 3 a. Two-dimensional 500 MHz NOESY spectrum of 
Sso7d (concentration -2.5 mM in 90%: 10% Hp:D,0). 6, 
Schematic view of the two antiparallel P-sheets in Sso7d. 
Hydrogen bonds used in the SA simulations and observed 
NOEs are indicated with dashed lines and arrows, 
respectively. Additional hydrogen bonds, not used in Sa! 

arp Hpcrrihorl in +Ko trtv* * 



methyl group at position 3 (in the niajor groove) of the 
pyrimidine is unfiivourable lor binding. This is clear 
when comparing binding to poly(dAdU) and 
poiy(dAdT). Thus, DNA-protein interactions may oc- 
cur within the DNA major groove. Second, binding to 
dsDNA sequences with iwo inter-strand hydrogen bonds 
is stronger than to those with three hydrogen bonds in 
polymers lacking the pyrimidine methyl (that is, when 
comparing poly(dAdU) and poly(dKlc:) ti> poly(dGdC)). 
This behaviour might be related to some physical prop- 
erty such as flexibility, ct)nsidering that S.st>7d seems to 
induce condensation of DNA' . 

Titration curves lor Sso7d binding to ssDNA and 
ssRNA homopolymers in the presence of low salt con- 
centrations show saturation a! =t).(M).7. The bind- 
ing to. ssDNA andssRNA under lliese conditions appear 
to be weaker than that to dsDNA, although there is a 
possibility that these ct>mplexes are as strong as those 
with dsDNA but that the maximum binding-site den- 
sity is lower. I lowever, the thermal denaiuraiion studies 
described below indicate that lisDNA is preferred over 
ssDNA, because the melting temperature increases on 
lormation of the complex, l urthermore, increasing the 
.salt concentrations to physit>logical levels has a draniatic 
effect on the binding t*) single-stranded polynucleotides 
{ Fig. 2/;). Under these ct>ndit ions there is only very weak 
binding to poly(dA) and polyldC), whereas no binding 
to poIy{rA) and polyfrC) can be delected at polymer 
concentratit)ns < 100 pM bases. Thus, there seems to be 
a large binding preference for dsDNA compared to 
ssi:)NA and ssRNA at higher salt concentration condi- 
tions. 

At low salt concentrations it is alst) possible to moni- 
tor binding of the mont)deoxynucleosides dATP and 
dCTP through the quenching of Trp 23 fluorescence (Fig. 
2rt). The titration curves do not show saturation and \{ 
is difficult to estimate stoichiometrics and affinities based 
on the present data, but the binding seems to be weaker 
than that of the DNA and RNA polymers. 

Protection of DNA from denaturation 

Thermal denaturation profiles of double-stranded 
poly(dldC) in the absence and presence of bound Sso7d 
are shown in Fig. 2r. Poly(dldC) is thermally unstable 
above 32*C at the conditions used in the experiment 
shown in Fig. 2c. Addition of less than stoichiometric 
amounts of Sso7d increases the thermal stability of 
poly(dldC) yielding a biphasic DNA melting curve. Satu- 
ration of poIy(dldC) with bound Sso7d again results in 
a single phase denaturation profile with a melting tem- 
perature of about 70"C. Thus, binding of Sso7d increases 
the melting temperature of poly(dldC) by more than 
38"C at low salt concentrations. Similar, albeit somewhat 
attenuated, effects can be observed with shorter DNA 
oligomers at physiological salt concentrations (data not 
shown). It is difficult to quantify the effect of Sso7d bind- 
ing to DNA polymers at high salt concentrations be- 
cause melting temperatures are high even in the absence 
of bound protein. However, it seems possible that Sso7d 
binding may shift the melting temperature of DNA above 
that of the boiling point of water. 
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The remarkable effect of Sso7d binding on DNA ther- 
mal stability is very similar to that of the HTa protein 
from Thermoplasma acidophilum^^ . Stein and Searcy'^ 
argue that the HTa protein may act to protect bacterial 
DNA during short periods of denaturing conditions al- 
lowing the organism to cope with transient periods of 
high temperatures.The Sso7d protein may function in a 
similar manner in Sulfolobtis, The different extent of 
lysine methylation of proteins expressed at different 
growth temperatures may also relate to the bacterial re- 
sponse to heat shock and stabilization of functionally 
important proteins. Howeven the effect of Sso7d me- 
thylation on its DNA-stabilizing properties are un- 
known. 

NMR structure determination 

Two-dimensional NMR spectra of Sso7d were recorded 
at 500 and 600 MHz. The 'H spectrum (Fig 3fl) shows a 
very favourable resonance dispersion and could be al- 
most completely assigned using standard methodolo- 
gies'""". Upon assigning the sequence we found one dis- 
agreement with the published sequence: residue 13, 
which is a C^lii in the sequence of Choli ct nl.^\ is in fact 
a ( tin and this correction has been made in Fig. hi. The 
M l linewidlhs in Sso7d (3-8 H/) are typical for a protein 
with a relative molecular mass of 7>000, indicating that 
Sso7d is predominantly monomeric under the condi- 
tions used in the NMR experiments 

The NH^I'.SV spectrum of Sso7d contains stretches of 
very strong sequential NOH connectivities in 

combination with strong long range r/^^^^..^ and */,v(i.i.n 
N()1'S» which are typical for (5-sheet secondary struc- 
lures'\ These arise from one double-stranded and one 
triple-stranded anti-parallel (i-sheet (Fig. 3W.The pat- 
tern of intra- and inter-residue NOI** connectivities, the 



observation of slowly exchanging backbone amide pro- 
tons and low amide temperature coefficients allowed the 
identification of 14 intramolecular backbone-backbone 
amide hydrogen bonds within the anti-parallel P-sheets 
(Fig. 36). 

The three-dimensional structure of a fragment con- 
taining residues 1-62 of Sso7d was calculated using a 
dynamic simulated annealing (SA) protocol with 617 
non-redundant NOE distance constraints, 11 dihe' 
dral-angle constraints and 28 hydrogen bond distance 
constraints (two constraints per hydrogen bond), that 
is 10.6 constraints per residue. The NOE distances (d ) 
were distributed as 233 intraresidue (i=j), 151 sequen- 
tial (li-jl=l).51 medium range (2<|i-jl<4),and 182 long 
range (li-jl >5) NOEs (Table I ). The quality of the com- 
puted SA structures is good as judged from the low 
Lennard- Jones potential energies and the very small av- 
erage deviations from idealized geometries. The distance 
constraint violation statistics are also good: the average 
number of distance constraint violation >0.3 A is 0.2 
per structure and the largest violation found in any of 
the 35 structures is 0.38 A. The largest dihedral angle 
constraint violation is 3.2'. 

A plot of average backbone dihedral angles in the 35 
SA structures is shown in Fig. 4(/ and plots of dihedral 
angle order parameters are shown in Fig. 4/>-d Average 
backbone dihedrals are all within the allowed regions of 
a Ramachandran diagram (not shown), except for those 
of Lys 8. The backbone of I his residue is less well de- 
ll ned, as judged from the angular order parameters, 
which results in a sterically unfavourable geometric av- 
erage. The superimposed backbones of the fmal SA struc- 
tures are shown in stereo in I'ig. 5(L The backbone con* 
formation within the |i-sheet regions is well-defined, as 
indicated by atomic backbone root -mean-square devia- 



Table 1 Structural Statistics for Sso7d' 



Statistic 



<SA> 



(SA)., 
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R.m.s. deviation from experimental distance (A) 
and dihedral angle (deg) restraints* 
distance restraints (517) 
dihedral angle restraints (11) 

No. of violations' 

distance restraints (>0.3 A) 
dihedral angle restraints (>1**) 

f^,(kcal mol ')'^ 

Deviations from idealized covalent geometry 
bonds (A) 
angles (deg) 
impropers (deg) 

- The notation of the NMR structures is as follows: <SA> are the final 35 simulated annealing structures; <5A)^j^ 
the mean structure obtained by averaging the coordinates of the individual SA structures best fit to each ox 
followed by minimization by restrained regularization. 

*'The number of restraints is given in parentheses. . , . 5;^ 

' The maximum distance violation is 0.38 A and the maximum dihedral angle violation is 3.2** m an indiviau** 
structure. 

'^B^ ^ is the Lennard-Jones van der Waals' energy calculated with the CHARMM*^ force field. 

structural biology volume 1 number 11 november 199^ . 



0.025 ±0.0018 


0.024 


0.26 ± 0.23 
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0.20 
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0.31 
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-172 ±20 


-214 


0.0025 ±0.00016 


0.0026 


0.36 ±0.015 


0.36 


0.24 ± 0.03 


0.22 
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tions of 0.5±0.1 A compared to the geometric average 
structure (Table 2). Other regions are somewhat less well 
defined, as indicated by an overall backbone r.m.s.d. of 
I.I+O.2A. The side chains of several residues in the hy- 
drophobic core of Sso7d are also well resolved, as can 
been seen in Fig. 5k The C-terminal fragment (residues 
46-60) is somewhat more well defined than the loop re- 
gions, w ith a backbone r.m.s.d. of 0.9±0.2 A> and a short 
a-hclix including residues 52-59 is clearly discernible. 
This helix can also be deduced from a continuous stretch 
of strong sequential W^.^(i>i+I) and medium range 
(/^^(i,i-r3 ) and (/^|j(i/i+3) NOE connectivities. 
"^ rho tlnal set of SA structures contains several hydro- 
gen bonds> in addition to those used in the structure 
calculations. These involve the backbone amide protons 
and carbonyl oxygens of residues 18 and 15, 19 and 15, 
20 and 32, 25 and 28, 27 and 25, 50 and 46, and 50 and 
47, respectively. 

The Sso7d structure 

Sso7d i> a globular protein. The tertiary fold consists ot 
a iriplc->iranded anti-parallel (i-sheel, consisting of resi- 
dues :i-:3, 2S-33 and 41-46 (strands ill, IV and V, re- 
spect i\ el v ),onto which a double -stranded p-sheet, made 
up t)f residues 2-7 and 10-15 (strands 1 and II), is packed 
in an orthogiuial manner. The hydrophobic core con- 
sists of side chains at the interface of the two sheets, in- 
cluding those indicated in l*ig. 5/*. Strands I and II are 
connectcil lhn>ugh a type II reverse turn with a hydro- 
gen bond between the carbonyl of Tyr 7 and the amide 
of C .lu 10. Strand 11 ends in one complete turn of an (X- 
helix invt>lving residues 16-19, with a hydrogen bond 
between the carbonyl of Asp 15 and the amide of lie 19. 
Strands 111 and IV in the second |i-sheet are connected 
bv a t\ pe I reverse turn involving residues 25-28. Thus, 
hydrogen bt>nds between the carbi)nyl of Val 25 and the 
amide of Met 28, and the amide of Val 25 and ihecarbo- 
m l of Met 28 are present in the triple-stranded |5-sheet, 
in addition lt> those shown in l*ig. Mk Residues 35-40 
form a surface loop, containing the glycine tripeplide 
( ;iy 3(>-t ily 37-( '.ly 38 ( I'ig. 6). The structure of this loop 
is not verv well defined by the NMR constraints and it is 
clear that it can show a large degree of inherent fiexibil- 



Table 2 Atomic r.m.s. difference statistics for the Sso7d structure' 

Comparison Residues Backbone' All heavy atoms 



<SA> vs SA 



1- 60 
46-60 

2- 7,10-15.21-25. 
28-34,41-45 

1-60 



1.08±0.17 
0.95±0.22 

0.54±0.09 

0.45 



1.60±0.16 
1.72±0.28 

1.14±0.11 

0.80 



5A,.vsSA_^... 

-Notations correspond to those defined in Table 1 with the addition that SA^^ 
is the non-minimized geometric average structure. Residues 61 and 62 are 
excluded from the comparison due to lack of structural constraints in this 
region. 

^Superimposed fragments. 
'Atoms N. C and Cu. 
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ity. Strand V (residues 41-46), ends in a complete turn 
of an a-helix involving residues 47-50. This short heli- 
cal segment is anchored through hydrophobic interac- 
tions involving Ala 50 and Pro 51. The backbone of the 
C-terminal fragment is not as well-defined as the ^ 
sheets, but residues 52-59 appear to form two turns of 
a-helix. This short helix is packed against the core 
through hydrophobic interactions between Leu 54 and 
Ala 50. 

The surface of Sso7d contains a hydrophobic deft and 
several exposed hydrophobic side chains (Fig. 6a). The 
hydrophobic cleft consists of the N-terminal Ala I side 
chain and the isoleucine residues lie 16 and He 19 on 
one'side\ and the side chains of Pro 5K Leu 54 and Met 
57 of the C-terminal helix on the other. The Trp 23 and 
Val 25 side chains of strand 111 are completely exposed 
to the solvent and so is the methyl of Ala 44. The side 
chains of Tyr 7 and Met 28 are partially exposed on the 
surface. 

The many basic lysine and arginine side chains are 
rather evenly distributed at the surface and the positive 
charges seem to be partially compensated for by nearby 
acidic side chains. However, the face of the triple- 
stranded (i-sheel appears to be predominantly positive 
in charge. This surface also a>ntains the exposed Trp 23 
side chain: the lluorescence of this residue is quenched 
by ^KYVi} upon formation of a complex with dsDNA.Thus, 
this face of the protein may be the DNA binding surface. 

Sso7d and eukaryotic SH3 domains 

The topology ol Sst)7d is very similar to that of eukary- 
otic SI 13 domains (l-ig. 7i/). The SI 13 domains are small 
protein modules! about residues) which, together with 
SI 12 domains, are found in many proteins involved in 
signal transduction in eukaryote '*. The SH2 and SH3 
domains are commonly \\n\\K\ in kinases or phospholi- 
pases, where they are believed to participate in protein- 
prt>lein interactions. The structures o\ SH3 domains 
frt>m several proteins have recently been solved by both 
NMU spectroscopy and X-ray cryslallography^^-^^ 

The minimized average structure of Sso7d is com- 
pared with the structures of the SI 13 domains of chicken 
brain a spectrin" (PDU entry ISIU;) and human ^ti 
proto-oncogene-'(PI)U entry ISIII') in Fig.7flandan 
alignment of the three sequences based on secondar>' 
structure and folding li>pology is shown in Fig.yk'nic 
superimpo.siti(>ns included 38 ( :(X co()rdinates of the five 

[i-strandsand a fragment from theC terminus in Sso/d 
(residues 1-7, 10-1^, 21-25, 28-33, and 4 1-53; Fig. 7 f). 
The r.m.s.ds with correspt>nding fragments m the a 
spectrin and fyn SH3 domains are in both cases 3.3 A. 
Thus, there is'a got>d quantitative agreement t>etw^^" 
these structures. Differences are found at the N and 
termini and for surface loops. In particular, the inter- 
connection between the (i-strands of the two SH3 do^ 
mains which corresponds to strands IV and V in Sso/ 
is extended into the putative P-loop in Sso7d (Fig. 7^i ^ 
Comparison of the complete sequences of Sso7a an 
the SH3 domains does not reveal sequence 
However, homology can be inferred when mJ: 
only the fragments for which there is structural sim 
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ity, that is, when excluding loops and N and C termini, 
ahhough any homology is still too weak to be conclu- 
sive by conventional alignment algorithms. Sequence 
identities and sequence similarities (aromatic/hydropho- 
bic residues) in the fragments that were used in the struc- 
tural alignment are shown in Fig. 71k It is worth noting 
that several residues which are well conserved among 
various SH3 domains- are present at the corresponding 
positions in Sso7d. These include Va! 3 in Sso7d (an ala- 
nine in SH3), Phe 5 and Tyr 7 (aromatics), Lys 1 2 (lysine), 
Val 22 and IVp 23 (hydrophobic). Met 28 and He 29 (tryp- 
tophan and tryptophan/hydrophobic), Gly 43 (glycine), 
Ala 44 and Val 45 (aromatic or hydrophobic), and Ala 30 
( hydrophobic). Sso7d and SH3 domains are also similar 
in that they expose hydrophobic surfaces''. 

The possible origin and significance of the structural 



similarity between the Sso7d, which is an abundant pro- 
tein in the archacon SnJfohyhtt^, and the SH3 domains, 
which appear to have assumed highly specialized roles 
in signal transduction in cukaryote, is unclear. One sce- 
nario may be that the fold has survived in all kingdoms 
due to its (thermal ) stability and because it forms a suit- 
ably small and stable platform for different functions in 
various organisms. An SH 3-1 ike fold has also recently 
been discovered l\)r a small protein in the photosyslem 1 
complex (PsaH) in cyanobactcria-\ Structural similari- 
ties to SH3 has also been ntiled in another DNA-bind- 
ing protein: the biotin biosynlhetic operon repressor 
(HirA) in E. to/r". 

Methods 

Protein purification. Sulfolobus solfataricus (DSM 1617) isolated 
from volcanic hot springs in Italy' was purchased from the 
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Fig. 5 a. Stereoview of superimposed backbone traces of residues 1-62 in Sso7d. For the sake of 
clarity, only 1 1 of the 35 SA structures are shown. The structures are superimposed to minimize 
r.m.s differences of backbone atoms in residues 1-60. N and C termini are coloured in blue and 
red, respectively. The loop containing the putative phosphate/nucleotide binding site is coloured 
in green, b. Stereoview showing the resolution and packing of hydrophobic side chains in the 
protein core. The structures have in this case been superimposed to minimize r.m.s. deviations 
between heavy atoms of residues constituting the core. 
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Fig. 6 Space-filling model of Sso7d showing exposed hydrophobic 
(yellow) and aromatic (orange) side chains (tyrosine hydroxyls 
are also coloured in orange). The glycines in fragment 36-38 are 
coloured in green. The views in (a) and (6) are from opposite 
directions. N and C termini are indicated in (a). 



"Deutsche Sammlung von Mikroorganismen" (j^^j, 
Braunschweig). Cultivation was performed aerobically at 1S°Q ( 
7) with an additional 10 gl ' saccharose in a membrane fermenT- 
(Bioengineering). The cells were heat-shocked for 90 min at 88=^'- 
and harvested by centrifugation. Protein was also purified fro- 
cells that had not been subjected to heat shock, for compariso- 
of the extent of lysine methylation. ^ 
100 g cells were lysed in buffer A (10 mM Tris buffer, pH 8.8 wi- 
20 mM NaCI, 1 0% Glycerol) by passing the cell suspension throuc- 
a French press. The lysate was centrif uged to rerriove cell deb^ • 
and dialyzed against the same buffer. The cytosoltc proteins we-^ 
loaded onto a Mono 0 (Pharmacia HR 10/10) column equilibrate- 
with buffer A: Sso7 was found in the flow-through. This fraaic- 
was concentrated in an Amicon stirred cell and applied in 1.5 r 
fractions to a Superose 6 column (90 x 1.5 cm) equilibrated wn- 
30 mM Tris/HCland 200 mM NaCl at pH 7.4 . Fractions contaimr" 
Sso7 were pooled, dialyzed against 50 mM potassium phosphate' 
50 mM NaCl at pH 6.0, loaded onto a Mono S (Pharmacia HRic 
10) column equilibrated with the same buffer and eluted with^ 
linear gradient of buffer 8 (50 mM potassium phosphate pH 8, if/ 
NaCl). 5so7d etuted at 25% B in two separate peaks, due to'th'e 
presence of differently methylated species of the protein. 
Sso7d concentrations were measured spectrophotometrically or. 
a Cary 4E spectrophotometer using an extinction coefficien: 
calculated from tyrosine (f 1400 M ' cm ') and tryptopha- 
<**'-t>.M.= ^^00 ^ ' cm ') absorption- r 

NM Ft" samples were prepared in 90%: 10% H,0: D^O or 100^ 
D.O with 20 mM potassium phosphate (pH 5 or 6), 50 mM NaC 
and 0,1% dzide. The structure cleiermination is based on data 
recorded on the following four NMR Stimples: 2.5 mM protein a* 
pH 6 containing material from both peaks eluted from the Mono 
S column.; -0.2 mM protein M pH 6 (ontaining material elutinc 
under peak 2; 1 mM protein <it pH 5 containing material elutino 
under peak 1 ; and 2 mM protein containing both uartionsin D.C 
buffer at pH 6 (non-corrected pH meter reading). The first ana 
last samples contained two distinct NMR species. A combination 
of spectra collected on the second .ind third samples corresponds 
to the NMR spectrum of sample t . 

Mass spectrometric analysis. M.iss spectrometry (MS) was 
carried out at Phannacia Biosc ieme Centor, Stockholm, using a 
VG Platform mass spectronuMor Irom Fisons Instruments equippeo 
with an elcctrospray interface The riiobiU? phase consisted oi 
mcthanol:w«Ucr (1:1) with I % <i( liK .k uI. The raone 700 < (M/z) 
< 1700. where M is-thc nwi^s .ind / h Ihe charqc was scanneo 
and calibrated using horse luNirt niyoglobm as a calibration 
standard. Uncertainties iti moleculcir mass determinations are 
approximately two mass units. 

Equilibrium titrations. The DNA and RNA polynucleotides used 
were purchased from Pharmacia and dissolved in 1 50 mM NaC; 
and 10 mM Tris/HCI at pH 7.4. Polynucleotide concentrations were 
determined spectrophotometrically using extinction coefficients 
given by Pharmacia. The deoxynucleosides ATP and CTP were 
purchased from Boehringer-Mannheim. 
Equilibrium titrations were carried out at 20''C in buffer C (100 
mM NaCl. 1 mM MgCI., 0.1 mM octaethylene glycoi monododecy: 
ether (C„E,) and 20 mM Tris/HCI at pH 7.4) and in buffer D (0.5 
mM C,,E,and 20 mM, Tris/HCI at pH 7.4), for which the pH 
measurements refer to 20"C. Titrations were performed as reverse 
titrations, in which different amounts of DNA/RNA were added at 
constant protein concentration (1 pM in buffer C and 2jaM 
buffer D). Steady-state fluorescence measurements were carried 
out on a Shimadzu RF-5000 spectrofluorophotometer using the 
methodology and additional titration instrumentation recently 
described elsewhere- ". The excitation wavelength was 290 nm 
and emission intensities were sampled at 0.2 nm intervals withm 
the wavelength range 340-355 nm. Emission .pectra were 
recorded five times for each titration point in order to minimize 
[ effects of instrumental fluctuations. Measured fluorescence 
intensities were corrected for background emission by 
(small) signals from buffer samples and for optical filtering effects 
due to DNA absorption at 290 nm. 
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The fractional fluorescence quenching (Q^) was calculated as {l^- 
0/1 , where 1^ is the protein fluorescence intensity observed in the 
absence of DNA/RNA and I is the intensity in the presence of DNA/ 
RNA. Binding isotherms are presented as plots of Q^t^, against the 
logarithm of the basepair (dsDNA) or base (ssDNA, ssRNA and 
monodeoxynucleosides) concentration. 

DNA melting studies. Light absorption of poly(dldC) at 260 nm 
was measured as a function of temperature on a CARY 4E 
spectrophotometer, which allows the simultaneous measurement 
of up to six melting curves. The temperature was increased in 
steps of rc during a time period of 30 s, followed by a holding 
lime of 60 s prior to absorbance measurements. The denaturation 
experiments were performed in 5 mM Tris/HCI at pH 7.0 (buffer 
E) with various concentrations of added Sso7d. 

NMR spectroscopy. NMR spectra were recorded on Varian Unity 
SOO and 600 NMR spectrometers operating at magnetic fields of 
11.74 and 1 4.09 T, respectively and equipped with programmable 
pulse modulators and pulsed field gradient hardware. Spectra were 
recorded at 293. 303. 313 and 323 K. 'H chemical shifts at 303 K 
(available from the authors) are referenced to H .0 at 4.74 p.p.m.. 
Phase sensitive two-dimensional spectra were recorded in the 
hypercomplex mod^'**. 



Two-dimensional homonudear DOF-COSY ', NOESY^-. and clean- 
T0C5Y spectra were recorded using spectral widths of 6.000 
Hz, 2*512 t, increments, 1024 complex data points in the 
acquisition time domain and with 8-32 transients per t, increment. 
NOESY spectra were recorded using cross relaxation mixing times 
of 60 or 200 ms and clean-TOCSY spectra were recorded using 
isotropic mixing times of 10, 60 or 80 ms. A 2D 'H,'-^C-H50C 
spectrum was recorded using gradient selection'- with a 'H and 
'^C sweep widths of 6000 and 20000 Hz, respectively, 2*128 
t increments, 512 compie\ data points and 160 transients per 
increment. The HSOC sequence was optimized for a C-H scalar 
coupling constant ot 140 Hr. with the "C transmitter placed at 
57 p.p.m.. 2D SS-NOESY spectra were recorded with a sweep 
width of 8000 Hz and a 200 ms mixing time. The third pulse in 
the SS'NOESY sequence is a shitted laminar pulse^" creating a 
zero net excitation at the frequency ot the transmitter (water 
resonance). Water suppression was achieved by presaturation of 
the water signal or presaturation m combination with SCUBA water 
suppression '. No presatiirattor- was used in the HSQC and SS- 
NOESY experiments. 

NMR spectra were processed us^ng software from Varian (VNMR) 
and/or Biosym Technologies. 2.2). Data processing typically 
involved apodization with shifted Gaussian functions in the t. 
(acquisition time) domain tinu sme/cosine bell functions in t,, and 





Fig. 7 a. Comparison of folding topologies in Sso7d and SH3 domains. The stereo picture contains 
the superimposed backbones of Sso7d (grey), the SH3 domains of chicken brain a sP^ctrin (green) 
and the human fyn proto-oncogene (blue). 6, Secondary structure based alignment of the 5so7d 
sequence to those of the SH3 domains of chicken brain a spectrin (C spec a), and the human fyn 
proto-oncogene (H fyn). Elements of secondary structure in Sso7d are showri at the top. The 
numbering refers to the Sso7d sequence. The grey bars indicate fragments used in the structure- 
based alignment. Orange boxes indicate similar or identical hydrophobic residues within the 
aligned sequences. The blue and green boxes denote a lysine and a glycme which ts located at 
identical positions In the aligned sequences. 
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baseline correction using routines available within the two 
software packages. Processed spectra typically contained 
1024x1024 real data points. 

NMR data analysis. Spin system identification and sequential 
resonance assignments of resonances in Sso7d were carried 
out in homonuclear 2D spectra using standard methodologies'^ 
The natural abundance •KPH HSQC spectrum aided significantly 
when sorting out 'H methyl and aromatic resonances. Most 
assignment work and collection of NOE constraints were carried 
out on spectra recorded at 303 K. Analysis of NMR spectra and 
compilation of NOE data were performed using the interactive 
computer graphics program ANSIG^^. 

Stereospecific assignments of prochiral methylene groups were 
carried out by identifying predominant x' rotameric states using 
coupling constants measured in DQF-COSY spectra and 
intraresidue NOEs measured with a short (60 ms) mixing time'^ 
The relative magnitudes of and coupling constants 

could also be measured in clean-TOCSY spectra recorded with a 
short (10 ms) mixing time using reported simulations'*" as a 
reference for expected cross peak intensities. Valine methyl groups 
were stereospecifically assigned and x' rotamers from the 
magnitude of the 7 .^^^ coupling and the relative intensities of 
intraresidue d^, ,,,, NOE connectivities", (note that the notations 
of valine yl and y2 methyls in ref. 41 are exchanged compared to 
convention). 

Thp X rotameric states of Thr 2 and Thr 32 were estimated as 
lollows. Both residues have relatively small 'J^^^,^^^^ coupling 
t onsMrits and the HN-H« cross peaks in DQF-COSY are quadratic ' 
in<li<attnf] that x'=60 or x^=lS0. Inspection of the short mixing 

limo NOESY spectrum rovealed that d,^,,,,,> d, ^in Thr 2, which 

ts consistent with x'^ISO, whereas c/,,^,„^> d,,^^,]^, in Thr 32. which 
IS (onsisicnt with X =60. 

NOEs were {|uantifi(?d as distance constraints based on cross peak 
voliinios measured in a NOESY spectrum recorded with a mixing 
tiino ot 60 ms. The conversion of volumes into distances was 
based on calibration of observed intraresidue and sequential NOEs 
within well-defined segments ot anti-parallel |i-sheet"*. NOE 
volnnu?s involving HN protons were corrected for the presence of 
10% D O m the sample. Cross peak volumes involving methyl 
f)r()tons wcr(^ rliviflcd by three prior to conversion into distance 
(onstraints. Distance constraints were divided into four classes: 
strong (<-. 2.7 A), medium (<3.3 A), weak (<5.0 A) and very weak 
i<6.0 A). Pseudoatoms with appropriate distance corrections were 
irodted lor non-stcieospcctlicaily assigned methylene protons, 
.iroiihitic ring [)rotons and the methyl groups in leucines'". A 
troduietl) psoudoatom correction of 0.3 A was used to account 
loi rifeus due to i.ipid rotation ot methyl groups". 
A total of 14 hydrogcMi Ixjnded amide protons could be identified 
eiilu»r as slowly exchanging resonances in a TOCSY spectrutn of 
Sso/(l dissolvod in D O, or as ami(ie-prolon resonances for which 
the temperature dependence of the cfiemical shift is small (< 5 
p.p.b.K ) fompcuod to that. of C-terminal residues which are 
exposed to the solvent i> 8 p.p.b.K '). These experimentally 
supported hydrogen bonds (between backbone amide protons 
and carbonyl oxygens) were imposed in the structure calculations 
as 28 distance constraints with lower and upper bounds of 1 .8 A 
and 2.4 A for amide hydrogen to carbonyl oxygen distances, and 
2 ,6 A and 3.4 A for amide nitrogen to carbonyl oxygen distances, 
respectively. The hydrogen bond constraints were imposed at a 
late stage ol the structure refinement at which point hydrogen 
bond donor-acceptor pairs could be unambigously identified. All 
hydrogen bonds used in the calculations are within well-defined 
regions of anti-parallel [J-sheet. A table of sequential assignments 
of the Sso7d =H NMR spectrum at 30"C and pH 6.0 is available 



from the authors on request. 



Structure Calculattons. Three-dimensional structures w 
determined using a dynamic simulated annealing (SA) method^ 
implemented within the X PLOR 3.0 program^V The protocol 
Nilges et a/." was used with some modifications, as described 



below. Extended peptide conformations were used as 



starting 



structures in the simulations. The X PLOR force field— containin 
potentials for chemical bonds, repulsive van der Waals' interaaio ' 
and experimental (distance and dihedral) constraints— was usea^ 
The k, constant of the distance constraint potential v.-as set to 5q 
kcal mol 'A-' and the force constant of the dihedral (x') squ^, 
well potential was set to 200 kcal mol ' rad Force constants fo^ 
planarity and chirality were set to 50 kcal mol ' rad'^ The 
simulations were carried out in five stages: /, TOO steps Poweii 
energy minimization to remove bad non-bonded contacts- /V 15 
ps of dynamics at 1000 K with normal van der Waals radii' and - 
low repulsive force constant (0.002 kcal mol ' A"*); ///. 10 ps of 
1000 K dynamics during which the repulsive force constant was 
increased to 0. 1 kcal mol ' A * and the assymptote in the NOE soft 
square well potential (constant c in ref. 44) was increased from 
0.0 to 1 .0 (in 10 steps); iv. cooling to 300K during 5.6 ps (28 steps 
of 0.2 ps with 25K cooling/step) with repulsive force constant of 
4.0 kcal mol ' A ' and van der Waals' radii scaled by 0.8; and v 
1200 steps of Powell minimization with normal van der Waals' 
radii and force constants for planarity and chirality set to 500 kcai 
mol ' rad •'. A 1 fs time step was used throughout with bonds 
constrained using the SHAKE algorithm during stages i'-iv. 
An ensemble of structures was initially calculated after the 
sequential assignmeiUs were almost completed and about 300 
distance constraints had been collected. The simulations were then 
repeated several times duitng struc ture refinement. The final round 
of SA contained 50 simul<itK)ns out ol which 35 converged yieldina 
low energy structures. An avera(i(» SA structii e (SA^ ) was 
calculated from the 35 SA slructiirt^s i)y tiveraging superimposed 
coordinates. The averaqe structure was also minimized (SA ^ ) 
using the same potential as in stage v of the SA protocoL The 
structures were analy/cd with res[)oct to the precision of atomic 
positions and dihedral .ingles, constraint violations, deviations from 
idealized bond geometnes and ncMi -bonded interaction potentials, 
and further characterized with rc^specl to dihedral angle 
conformations anti hy<lrog(Mi bonding. Dihedral angle order 

parameters, S **', roflixtnu) the prrtision of the corresponding 

dihedral within the cnsombU* weu* ( .ilculaled according to Hyberts 
et a/. "*. A value of S""'"' approadniu] unity indicates a very well- 
defined (iihetfral cviglo whcrtMs an isotropic disiribution yields 

S *'*"•• =0 (but S "=0 must not n('((»ssarily reflect an isotropic 

distribution). Tfie enscml)l(> of SA structures were also searched 
for additional intermokxular hydrogen bor^ds using the following 
two criteria: the distance between the donor hydrogen and 
acceptor oxygen and the two heavy atoms must be less than 2.5 
A and 3.5 A, respectively. Hydrogen bonds mentioned in the text 
fulfil these criteria in at least 18 of the 35 SA structures. Structural 
r.m.s. differences quoted in the text refer to comparisons with 
the average structure ISA ). it should be noted that r.m.s. 
difference comparisons containing all atoms' can sometimes be 
erroneous and too large due to the specific atom labolling of phenyi 
and tyrosyl rings and carboxylate groups. This ts because the 
computer program evaluating r.m.s. differences does not always 
consider the inherent symmetry of these groups and therefore 
can give a large rm.s. difference even in the case of perfect overlap 
(P Kraulis, personal communication). 
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abstract: The genes for two Sac7 DNA-binding proteins, Sac7d and Sac7e, from the extremely 
thermophilic archaeon Sulfolobus acidocaldarius have been cloned into Escherichia coli and sequenced. 
The sac7d and sac7e open reading frames encode 66 amino acid (7608 Da) and 65 amino acid (7469 Da) 
proteins, respectively. Soutliem blots indicate that these are the only two Sac7 protein genes in 5. 
acidocaldarius, each present as a single copy. Sac7a, b, and c proteins appear to be carboxy-lerminal 
modified Sac7d species. The transcription initiation and termination regions of the sac7d and sac7e genes 
have been identified along with the promoter elements. Potential ribosome binding sites have been 
identified downstream of the initiator codons. The sac7d gene has been expressed in E. coli, and various 
physical properties of the recombinant protein have been compared with those of native Sac7. The UV 
absorbance spectra and extinction coefficients, the fluorescence excitation and emission spectra, the circular 
dichroism, and the two-dimensional double-quantum filtered NMR spectra of the native and recombinant 
species are essentially identical, indicating essentially identical local and global folds. The recombinant 
and native proteins bind and stabilize double-stranded DNA with a site size of 3.5 base pairs and an 
intrinsic binding constant of 2 x 10^ for polyLdGdCJ-polyLdGdC] in 0.01 M KH2PO4 at pH 7.0. The 
availability of the recombinant protein permits a direct comparison of the thermal stabilities of the 
methylated and unmethylated forms of the protein. Differential scanning calorimetry demonstrates that 
the native protein is extremely thermostable and unfolds reversibly at pH 6.0 with a Tm of approximately 
100 °C, while the recombinant protein unfolds at 92.7 °C. 



Small basic DNA-binding proteins have been isolated from 
various archaea, some of which have been shown to be 
ivsociated with the nucleoid or chromatin and presumably 
pcrfonn a histone-like or helix-stabilizing function in these 
organisms (Searcy, 1975; Stein & Searcy. 1978; Searcy & 
EXrlange. 1980; Thomm et al., 1982; Grote et al., 1986; Lurz 
ctal., 1986; Choli et al., 1988a,b; Reddy & Suryanarayana, 
1989; Sandman et al., 1990), although the acmal function 
of many of these proteins has not been demonstrated. HTa 
proiein from the thermophilic archaeon Thermoplasma 
ocidophilum shows considerable homology to eukaryotic 
hisiones and Escherichia coli HU protein (Searcy, 1975; 
Searcy & Delange, 1980). Hmfl and Hmf2, two DNA 
Innding proteins from Methanothermus fervidus, are also 
liomologous to some of the eukaryotic histones (Sandman 
flal. 1990). V 

Sulfolobus, a thermoacidophilic archaeon, expresses a 
Dumber of small basic DNA-biiiding proteins ranging in 
molecular weight from 7000 to 10 000 (Kimura et al., 1984; 



This work was supported by the Biotechnology Research and 
^tlopment Corporation (J.W.S. and SPJE.) and the National Institutes 
« Health (GM49686) (J.W.S. and S.P;E.). A preliminary report of 
work was given at the Swedish Biophysical Society Meeting, June 
1990, Lovanger, Sweden, and at the Biophysical Society Meeting, 
f*^9-13, 1992. Houston. TX. ' 

•Authors to whom correspondence. should be sent Phone; 618- 
<^3^79 or 618^53-6466. Fax: 618-453-6440. E-maU: rgupta@ 
"oni^iu-cdu or jshriver@som.siu.cdu. ' ' 

Present address: Vanderbilt University, Department of Molecular 

1 161 21st Ave. South, Nashville TN 37235. / 
Present address: Departihent of Pharmacology and Molecular ' 
"^^togy. Chicago Medical School, N. Chicago. IL 60064-3095. 
Abstract published in Advance ACS Abstracts, July 15. 1995. 



(jrote et al., 1986; Choli et al.. 1988a). These have no 
apparent homology to any of the histones. Much of the early 
work on these proteins resulted from a search for chromatin 
proteins that might stabilize the genomic DNA at the high _ 
growth temperature. Sulfolobus acidocalddrius grows op- 
timally in the range of 70—80 °C, while Sulfolobus solfa- 
taricus grows optimally at approximately 75-85 °C. The - 
G+C base composition of Sulfolobus DNA is about 409^, 
and its cellular salt concentration is relatively low, making 
a helix-stabilizing protein presumably necessary (Reddy & 
Suryanarayana, 1988). The 7 kDa class of proteins has been 
presented as a likely candidate given that they are present 
in relatively large amounts in the cell (Grote et al., 1986; 
Choli et al., 1988a,b). 

-Five proteins have been isolated in the 7 kDa class from 
S, acidocaldarius (Kimura et al., 1984; Choli et al., 1988b), 
and have been labeled Sac7a' through Sac7e, in order of " 
increasing basicity. Four of these, SacTa, b, d, and e, faave 
been sequenced 0^igure 1) (Kimura et al., l984;'Choh et 
al., 1988b), and only minor differences among them have 
been noted. The sequence of Sac7c has not been reported. 
The number of genes encoding the 7 kDa proteins of 5. 
acidocaldarius has not been determined! Comparison of the 



* Abbreviations: DSM. Deutsche Sammlung fur Mikroorganismen; 
IPTG, isopropyl ^-D-thiogalactopyranoside; NMR. nuclear magnetic;, 
resonance; COSY, correlation spectroscopy; DQF-COSY, double- 
quantum filtered correlation spectroscopy; DSC. differential scanning 
calorimetry; CD. circular dichroism; Sac7, a group of 7 kDa DNA- 
binding proteins from Sulfolobus acidocaldarius, individuaUy referred 
to as Sac7a, SacTb. Sac7c. Sac7d. and Sac7e, in order ojF, increasing 
basicity; Sso7, a group of 7 kDa DNA-binding proteins from Sulfolobus 
solfataricus. ^ - - f\ ^ 
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amino acid sequences indicates that there must be at least 
two separate genes coding the 7d and 7e species. The high - 
degree of similarity observed in the primary sequence of the 
7d and 7e proteins suggests that two genes arose through 
gene duplication. Sac7a and Sac7b are truncated versions 
of the Sac7d protein, most likely resulting from truncated 
genes, posttranslational processing, or degradation during 
isolation. , . : 

Specific £-aminomonomethylation of lysines 4 and 6 is 
characteristic of the Sac7a, b, and d proteins, while Sac7e is 
monomethylated at lysines 6, 62, and 63 (residue 4 is an 
arginine in Sac7e) (Kimura et al., 1984; Choli et al.. 1988b). 
No lysine methylation has been detected in the C-terminus 
of Sac7a, b. or d. presumably since there are no lysines at 
positions 62 and 63 in these proteins, although Sac7d 
contains lysines at positions 64 and 65. The Sso7d protein 
from S, solfataricus is monomethylated at lysines 4 and 6 
and also at lysines 62. 64. and 65 (Choli et al.. 1988a). The 
role of lysine monomethylation has not been determined but 
is most likely nontrivial given the specificity (there are 12- 
14 lysines in these proteins) and the occurrence in both 5. 
acidocaldarius and 5. solfataricus proteins. Baumann et al. 
(1994) have recently shown that an increase in Sso7d 
methylation occurs upon heat shock and indicate that 
methylation may be directly related to protein stability. 
However, methylation may be an incidental response to an 
increase in methylase activity directed at other processes. 
Methylation may also increase the reversibility of the 
unfolding process rather than changing the stability. A direct 
calorimetric measurement of the unfolding and stability of 
these proteins has not been reported. 

The Sac7 proteins would appear to be ideal models for 
studies of protein folding and stability given their small size, 
the absence of cysteine, and expected high thermostability. 
Biophysical analyses of these proteins is hampered, however, 
, by the inability to selectively isolate a homogenous isoform 
in large quantities. The differential methylation of individual 
7 kDa proteins could further complicate quantitative studies 
of structure and stability as well as DN A binding. Therefore, 
we have cloned and expressed the gene encoding the Sac7d 
species in E. coll to facilitate elucidation of the solution 
structure of the protein by NMR with high resolution, probing 
of the thermostability and DNA-binding properties of the 
protein by site-directed mutagenesis, and determination of 
the role of methylation. The availability of recombinant 
protein allows for.a^direct comparison of the stability of the - 
methylated and unmethylated proteins. In the process oiF 
cloning the sacld gene, the gene for Sac7e has also been 
cloned and sequenced; and we have delineated the transcript 
tion initiation and termination regions of the sac7d and sacle 
genes along with the promoter elements. 

An initial structure of the native Sso7d protein has been 
recently published by Baumann et al. (1994). and a high- 
resolution structure of the homogeneous, recombinant Sac7d 
protein has been completed (Edmondson, Qiu, and Shriver, 
manuscript submitted). There are significant differences 
between these structures, and it remains to be determined if 
these can be attributed to sequence differences, lysine 
methylation, or quality of data due to heterogeneity in the 
native preparation. The spectroscopic, DNA binding^- and 
calorimetric comparisons of the native and recombinant Sac7 
proteins reported here indicate little difference in structure, 
but significant difference in thermostability. 
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MATERIALS AND METHODS 

Strains of Microorganisms, E. coli strain DH5aFlQ [F 
lacl'^ZMAlSIHs. {lacZYA-argF) recAl hsdRWivy- m^-^)] was 
purchased from Gibco BRL. E. coli strains HMS174 (p 
recA r-ki2 m\i2 Rif). BL21 (F" ompT r's ^-b), and their 
derivatives were generous gifts from F. William Studiei 
(Studier et al., 1990). E. coli strain 0236 {dur ung-) was 
obtained from Jack Parker (Southern Illinois University. 
Carbondale, DL). S, solfataricus P2 and 5. acidocaldarim 
DG6 were gifts from Dennis Grogan (Grogan, 1989. 1991). 
S, acidocaldarius (DSM 639) and S. solfataricus PI (DSM 
5354) were purchased from Deutsche Sammlung fiir Mii- 
roorganismen (DSM). 

The Sulfolobus strain used here was received from 
Zillig (originally called S. solfataricus PI). We have isolatec 
a single colony of our organism on solid medium (Grogan 
1989) and have compared the ////idin. £coRI, and Sal 
restriction fragment patterns of its genomic DNA with twc 
strains of S, acidocaldarius (DG6 and DSM639) and twt 
strains of S. solfataricus (DSM5354 and P2) according K 
Grogan (1989). In each case the restriction pattern of ou 
organism is identical to the S, acidocaldarius strains and i 
distinctly different from the S. solfataricus strains. This ha 
been further substantiated by Southern analysis of genomi' 
DNA using Sac7 protein gene specific oligonucleotides (se 
Results). We have designated our laboratory strain as 5 
acidocaldarius RGJM. There has been confusion in th 
literature regarding the identity of the strains of tw- 
Sulfolobus species used in various laboratories at differet 
times. Zillig (1993) has recently addressed this issue an 
tried to clarify the confusion. 

Growth of Microorganisms. E, coli strains were grow 
in Luria Bertani media (1% bactotryptone/1% NaCl/Oi'. 
yeast extract) by standard methods (Sambrook et al., 1989 
Small scale cultures of Sulfolobus (10-200 mL) were grw^ 
in Brock's medium (Brock et al.. 1972) at 75 °C, suppl< 
mented with 0.2% sucrose. Large scale Sulfolobus cultun 
were grown either in 10 L polypropylene carboy at 78 to I 
''C or in a 16 L VirTis glass fermentdr at 70-72 ^'C wi 
vigorous aeration using DeRosa's medium GI>eRosa 
Gambacorta, 1975) supplemented with 0.1%. glucose ai 
0.1% glutamic acid. . ' , '/ 

Enzymes and Chemicals, Restriction enzymes, aikalb 
phosphatase, T4 DNA ligase, T4 DNA polymerase andl 
- polynucleotide kinase were purchased from New Englai 
Biolabs. Brisco Ltd., BRL, or United States Biochemical C 
p2p]H3P04 and 5'-[a-^^S] adenosine thiotriphosphate t 
ethylammonium salt were purchased from ICN BiochemiG 
Inc. and Amersham Co., respectively. Sequenase versi 
2.0 DNA sequencing kit was obtained from United Sw 
Biochemical Co. Specific deoxyoligonucleotides were pi 
chased from Research Genetics. , The list of the olij 
nucleotides used in this work is presented in Table 1. DiJ 
bacterial media were purchased from Fisher Scientii 
CM52 was obtained from Whatman and Sephacryl S-K 
HR from Sigma Chemical Co. All other chemicals w 
reagent grade and obtained primarily from Fisher Scienti: 
J. T. Baker Co., and Sigma Chemical Co. Laboratory w 
was routinely purified to 18.3 MQ resistance with a recycl 
Bamstead Nanopure system. ' j 

Genomic DNA Isolation, Cells from lD-20 ml cultu 
were pelleted and resuspended in 0.2-0.3: mL of 10 r 
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Tris-HCl pH 8.0/1 mM EDTA/1% SDS. This solution was 
extracted once each with equal volumes of phenol, phenol/ 
diloroformAisoamylalcohol (25:24:1), and chloroformAsoamyl 
slcohol (24:1). Sodium acetate (3 M, pH 5.2) was added to 
die final aqueous phase to a concentration of 0.3 M, followed 
by DNA precipitation with three volumes of ice-cold ethanol. 
The DNA was spooled onto a thin glass rod, washed in 70% 
cihanol, and air dried. The DNA was dissolved in 10 mM 
Tris-HCl, pH 8.0/1 mM EDTA. 

Cloning, Hybridization, and Sequencing. The preparation 
of a f 5/1 genomic library of S. acidocaldarius RGJM in E. 
coll strain DH5aFIQ and screening of the library by colony 
bvtridization was according to published procedures (Berger 
i Kimmel, 1987; Sambrook et al., 1989). Southern and dot 
blot hybridizations were carried out using nitrocellulose 
membranes according to the manufacturer's protocols 
(Schleicher & Schuell) which are based on the method of 
Southern (1975). The preparation of [y -^^P] ATP and 5' ^^p. 

end-labeling of oligonucleotides was by standard methods 
(Johnson & Walseth, 1979; Gupta, 1984; Sambrook et al.i 
1989). DNA was sequenced by the dideoxy chain termina- 
tion method (Sanger et al.. 1977) using a Sequenase version 
2.0 kit. The fmal sequences were determined from both 
strands. The standard universal primers for Stratagene's 
pBluescript vectors (Short et al., 1988) and specifically 
$}Tithesized oligonucleotides were used in sequencing reac- 
tions. DNA sequences were analyzed using the computer 
program DNA Inspector lie (Textco Co.). 

Frimer Extension, Total RNA from S. acidocaldarius 
RGJM was isolated by previously published procedures 
(Emory & Belasco, 1990). The primer extension assay was 
conducted as described in the Promega "Protocol and 
.\pplications" manual. 

Oligonucleotide-Directed Mutagenesis. Procedures for the 
oligonucleotide directed mutagenesis were those outlined in 
the Bio-Rad Muta-Gene manual and are based on Kunkel's 
method (Kunkel et al., 1987) using E, coli dut~ung~ strains. 
Wc were unable to propagate the substrate for oligonucleo- 
tide directed mutagenesis, pBluescript KS-\-/sac7d (see 
Results for the description and nomenclature of the plasmids), 
m £. coH strain CJ236 {dur ung ). Therefore, we used 
DHSoFlQ as the host cell for the production of single- 
stranded template and as the recipient for transformation with 
mutagenized plasmid and modified the procedure for the 
selection of niutant plasmid. Colonies arising from trans- 
formation with xhc plasriiids from the mutagenesis reaction 
to create the Ndel site were pooled and grown as a mixed 
culture. Plasmids isolated from these cells were digested 
*ith Ndel and separated on a 0.8% agarose gel. Linear 
plasmids were isolated from the gel, recircularized, and again 
Qsed to transform DHSoFlQ. Plasmids were then extracted 
from individual colonies and screened for the presence of 
>n Ndel restriction site by digestion with the enzyme. Final 
confirmation of the desired mutation in the plasmids was 
*>*>iained by sequencing. ^ ^ 

Gene Expression. For gene cxiHessicm, pET-3b/sac7d was 
Iransformed into E. coli strain BL21 (DE3) pLysS (Studier 
^ al., 1990). For protein isolation, a 10 mL culture of this 
*ransformant was grown overnight in LB broth containing 
wnpicillin (200 /ig/mL) and chloramphenicol (27 fig/rnL), 
this, 0.6—1 mL was used to inoculate 50 mL of fresh 
"^um. At an i46oo of 0.3-0.6, 25 mL of the culture was 
^^uied into 1 L of new medium. The culture was induced 
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upon reaching an Aaoo of 0.8-0.95 by adding IPTG to a final 
concentration of 0.4 mM. A small aliquot of each culture 
was taken prior to induction to assay for expression and 
plasmid stability as described by Studier et al. (1990). 
Cultures were harvested at 1 h postinduction and stored at 
-70 X. 

Protein Isolation and Purification. E. coli cells containing 
recombinant protein were thawed slowly and resuspended 
in 100 mL of 10 mM Tris-HQ, pH 7.5/0.5 mM phenyl- 
methanesulfonyl fluoride, and the cells were lysed by 
repeated freezing and thawing along with brief sonication 
on ice. To isolate native protein, Sulfolobus cells were 
suspended in 0.05 M KH2PO4 buffer (pH 6.8) and lysed by 
sonication on ice. DNase I (20 mg/100 mL) was added to 
lysed cells, and the suspension was incubated at 37 for 5 
min followed by centrifugation at ISOOOOg for 60 min. The 
supernatant was cooled on ice and dialyzed in SpectraPor 
CE 1000 MWCO tubing against 0.2 M H2SO4 overnight at 
4 °C, The resulting precipitate was removed by centrifuga- 
tion at 1800(X)g for 30 min, and the supernatant was dialyzed 
four times against 20 mM Tris-HCl, pH 7.4/1 mM EDTA. 
A small amount of precipitate was removed by centrifugatioii, 
and the supematant was applied to a CM-52 ion exchange 
column equilibrated with 20 mM Tris-HCl (pH 7.4). The 
protein was eluted with a linear NaCl gradient (0.0—0.3 M) 
with both the native and recombinant Sac7 proteins giving 
a primary peak at approximately 0.2 M NaCl. Further 
purification was accomplished by gel exclusion chromatog- 
raphy on Sephacryl S-IOO-HR in 0.02 M Tris-HCl (pH 7.4). 

The identity and purity of the 7 kDa proteins were 
monitored by nonreducing SDS gel electrophoresis (Schagger 
& von Jagow, 1987). The recombinant protein showed a 
single band that comigrated with the mixture of Sac7 native 
proteins isolated from S. acidocaldarius (Figure 2) and was 
absent in preparations from control E. coli cells lacking the 
recombinant plasmid (data not shown). The Sso7 proteins 
nm slightly ahead of .Sac7 proteins, consistent with a / 
molecular weight of 7020 (calculated from the sequence^^ 
The Schagger— von Jagow gel used here did not resolve the 
individual Sac7 and Sso7 native species. The identity of 
the recombinant Sac7d protein was confirmed by compmson 
of the double-quantum filtered COSY spect?a of native Sac7 
and recombinant Sac7d proteins (see below) and by the 
consistency of the sequence specific *H NMR assignments 
with.the expected sequence (Edmondson, Qiu, and Shriver, 
submitted). , • 

In earlier studies the recombinant protein was isolated by - 
a different procedure (McAfee, 1993). E. coli cells Were 
lysed and DNase treated as above but without sonication. 
The pH of the supematant was adjusted to 1.5 with 5 M 
H2SO4. After 45 min on ice and centrifugation, the 
supematant was neutralized with 10 N NaOH. The mixture 
was incubated in a water bath at 70 °C for 2 h, followed by 
centrifugation. The supematant was dialyzed three times 
with 1 mM NaH2P04 buffer (pH 7.0) followed by CM-52 
chromatography as above. 

Molecular Weight Determination. Approximate molecular^ 
weights of the native and recombinant Sac7 proteins were 
determined by gel exclusion chromatography on Sephacryl 
S-1 00-HR. Cytochrome c, myoglobin, carbonic anhydrase, 
and bovine serum albumin were used as molectllar weight 
standards, and blue dextran and DNP-alanine were used to 
' measure the column void and total volumes, respectively. 
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The molecular weights were determined as described by 
Mayes (1984). . . : . - c : - : . . - 

Phosphorylation and Glycosylation Assays, Phosphate 
analysis was performed by the method of Fiske and Sub- 
barow (Fiske & Subbarow, 1925; Leloir & Cardini, 1957). 
Small aliquots of Sac7 (0.95 mL of a 0.5 mg/mL solution 
in 0.02 M Tris-HCl, pH 7.0) were incubated at 37 X for 1 
h with 0.05 mL of bovine intestinal alkaline phosphatase 
{25 mg/mL in 0.01 M Tris-HQ, pH 9.8). The protein was 
precipitated with 0.10 mL of concentrated perchloric acid, 
incubated on ice for 10 min, and centrifuged for 5 min at 
13 000 rpm. To 0.90 mL of supernatant was added 2.0 mL 
of distilled water, 1.0 mL of 5 N H2SO4, 1.0 mL of 2.5% 
ammonium molybdate, and 0.10 mL of reducing agent, 
[prepared fresh by dissolving 0.25 g of reducing mixture^^ 
(sodium bisulfite, sodium sulfite, and l-amino-2-naphthol- 
4-sulfonic acid in a 46:46:8 ratio) in 10 mL of water]. The 
solutions were allowed to stand for .20 min, and the 
absorbance was measured at 660 nm. A standard curve was 
prepared using known amounts of a 0.01 M KH2PO4 solution. 
O-Phosphoserine, treated with alkaline phosphatase as 
described for Sac7 gave quantitative recovery of phosphate. 

The phenol— sulfuric acid reaction was used to assay 
carbohydrate content of Sac7 protein (Debois et al.. 1956; 
Hirs, 1967). To 1.0 mL aliquots of Sac7 protein solution 
(0.3 mg/mL) was added 0.25 mL of 80% phenol and 2.5 
mL of concentrated sulfuric acid- After mixing, the solutions 
were left at room temperature for 10 min and then placed in 
a 25 ^'C water bath for 20 min. The absorbance was 
measured at 489 nm. Known amounts of a-D-glucose were 
used to construct a standard curve. . : - 

Protein Extinction Coefficient, Ultraviolet and visible 
spectra were recorded on a Gary 210 spectrophotometer at 
25 °C The wavelength accuracy was checked using benzene 
vapor and found to be accurate to within ±0.3 nm, and the 
absoibsance accuracy was checked using potassium chromate 
in 0.05 M KOH (Gordon & Ford, 1972) and found to be 
accurate to within 1%. . - ; ; - 

Hie extinction coefficients of both the native Sac7 and 
recombinant Sac7d proteins were determined by measuring 
tbt amino acid concentration using the ninhydrin reaction 
(Moore & Stein, 1954) for a sample of known absorbance. 
A standard curve was prepared using amino acid standard 
H (Pierce Biochefnicals) and converted into leucine molar 
equivalents. 'The concentration of amino acid standards was 
checked using tyrosine with an extinction coefficient of €270 
= 1340 in 0.1 M HQ. The molar concentration of amino 
acid residues in the samples was calculated by dividing 
leucine equivalents by the average color yield based on the 
amino add composition (Moore & Stein, 1954). The average 
color yields for Sac7d, lysozyme, and RNase A were 1.0, 
1.05, and 1.06, respectively. The extinction coefficients of 
lysozynie and RNase A standards were checked by this 
procedure and found to be within 1% of published values. 
Hie procedure gave an extinction coefficient pf L03 ± 0.05 
mL/^mg-cm) for both native and recombinant proteins. 

The extinction coefficients were also determined by the 
method of van lersel et ali (1985) immediately following 
chromatography of the proteins on Sephadex G-50 in 0.01 
M NaH2P04 buffer (pH 6.5). .A flat (±0.0005 absorbance 
units) spectrophotometer baseline was programmed using the 
same buffer which had been used to equilibrate thexolunm. 
Protein spectra were collected on samples directly from the 
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gel exclusion column, generally using only those sai 
with an absorbance less than 2.0 at 205 nm to minimi: 
effects of stray light. The reproducibility of the 
ratio using different aliquots collected through the p 
peak as it eluted from the colunm was found to be c 
order of 99%. The linear relationship between the extii 
coefficient at 280 nm and the ratio of the absorbance i 
and 205 run was confirmed in our hands using b 
a-chymotrypsin (Worthington), hen egg white lysc 
(Sigma), bovine pancreatic ribonuclease A (Sigma), j 
(Sigma), ^-lactoglobulin (Sigma), and bovine serum all 
(Sigma). A linear fit of the standards yielded a sta 
curve such that . ^ 



.: 4- = 35.74^- 0.04 

' ^205 .. . . .. 

with a correlation coefficient of 0.999 and a sta 
deviation for the slope of 0.62 and 0.03 for the y inte 
The extinction coefficients for the native and recoml 
protein were found to be identical with this technique s 
mL/(mg\:m) with a standard deviation of 0.008 mL/(m. 

The extinction coefficients were also calculated to b 
mL/(mg\:m) in 6 M guanidine hydrochloride, based ( 
amino acid content of the protein using the procedi 
Edelhoch (Edelhoch, 1967; Gill & von Hippel, 
assuming ejyr - 1280 M"' cm"\ ejrp = 5690 M"* cr 
6 M guanidine hydrochloride. An increase in absor 
of 3.5% was noted upon denaturation of the protein \ 
M GdnHCl, so the calculated extinction coefficient < 
folded protein was corrected to 1.05 mL/(mgxm). 
estimated error was taken to be ±0.04 with a maxima 
of ±0.15 (Gill & von Hippel, 1989). 

Circular Dichroism. Circular dichroism spectra of pi 
native Sac7 and recombinant Sac7d proteins were mei 
at room temperature in a 0.01 cm path length cylir 
cell oh an AVIV 62DS spectropolarimeter. CD dau 
collected at 1 nm intervals using averaging times of J 
s/nm, depending on the signal-to-noise ratio. Relativel 
signal-to-noise ratios made signal averaging pf multiple 
unnecessary. The spectral baridwidth was 1:5 nm. Ba; 
were measured using water and subtracted from the s 
CD. Sample concentrations ranged from 0.2 to 0.7 m 
Protein concentrations were determined from UV absc 
spectra measured in 1 cm cuvettes. The molar C 
peptide bond was determined using standard proo 
(Johnson, 1984) along with the UV extinction coef 
determined above. CD spectra were smoothed as des 
by Savitsky and Golay (1964), The^CD'was calibn 
290.5 nm with cf-camphor-lO-sulfonic acid using At 
2.36, and the ratio A6i923/Ae290j was -2.10 (Chen & 
1977). : - ^ . . - 

The fractions of protein secondary stmcturcs were 
mined by fitting the CD spectra from 260 to 184 ni 
, nm intervals using the variable selection method of Jc 
(Manavalan & Johnson, 1987). The results reported : 
averages plus or minus one standard deviation of all pt 
combinations of 22 reference proteins takeili'19 at ; 
that (1) have secondary structure components greatc 
-0.05, (2) have sums of secondary structures betwe 
and 1.1, and (3) have an rms error between measun 
calculated CD spectra less than 0.21 At units. The n 
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of fits meeting this selection criteria were greater than 250 
for native and recombinant protein. 

Nuclear Magnetic Resonance, NMR spectra were col- 
lected on a Varian 500 MHz NMR spectrometer widi the 
magnet installed on a TMC Micro-g triangular antivibration 
table. All data were collected at 35 °C in 90% H2O/10% 
DA pH 4.1, with a protein concentration of approximately 
10 mM. The pH was adjusted with DQ and NaOD using a 
Radiometer glass electrode and was not corrected for the 
deuterium isotope effect (Bundi & Wuthrich, 1979). The 
chemical shifts are referenced to the water resonance at 4.73 
ppm at 35 [measured relative to sodium 4,4-dimethyl- 
4-silapentane sulfonate (DSS) in a separate experiment . 
without protein]. . - - : : _ ' 

Phase-sensitive double-quantum filtered COSY (DQF- 
COSY) spectra were collected using standard procedures 
(Ranee et al., 1983). Typically, 1024 data points were 
collected in the /2 domain with 512 increments in the ti 
domain, each the sum of 32 scans with a 3 s relaxation delay. 
The spectral widths in both dimensions was 6000 Hz. The 
water peak was diminished in all experiments by presatu- 
raiion during the relaxation delay. Both carrier and decoupler 
frequencies were set equal to the water resonance frequency 
in all experiments (Zuiderweg et al., 1986). 

The NMR data were transferred to a Silicon Graphics 
workstation for Fourier transformation and further data 
manipulation using FELIX 2.1 (BioSym). The data were 
rcro-filled to 2048 data points in both dimensions and treated 
^iih a Lorentzian to Gaussian apodization function prior to 
Fourier transformation. 

Differential Scanning Calorimetry, Differential scanning 
calorimetry was performed with a Microcal MC2 calorimeter. 
Temperature calibration was monitored using sealed samples 
supplied by Microcal. Heat flow accuracy was periodically 
monitored by applying pulses of known magnitude using the 
biemal heater. In addition, ribonuclease A (Sigma, R525b) 
»as used as a benchmark test protein and shown to imfold 
11 pH 2.2 [0.1 M KCl. 0.02 M glycine, 6280 = 0.69 mL/ 
tmgm), MW 13 700] with a 7^ of 36.0 °C, a tJi^ of 74.1 
Ual/mol, and a A//yh of 74.8 kcal/mol (AHcJAHyt ratio of 
1.00 ± 0.01), in good agreement with the published valiies 
of Tiktopulo and Privalov (1974). . 

Protein solutions were exhaustively dialyzed against the 
indicaied buffer overnight The sample cell was loaded with 
1229 mL of protein solution, and the reference cell was filled 
*ith the last dialysis buffer. Approximately 30 psi of 
<Qtrogen was applied to the cells during each scan to 
^^luiinuze degassing during heating. Samples were not 
<icgassed, but, instead, the sample was heated repetitively 
tee times in the DSC instrument by scarming to 35 (i.e., 
below any denaturation endotherm), followed by rapid 
^ling. This procedure resiilted in the flattest and most 
^^producible instrumental baselines. ' . . 
All DSC experiments were under computer control using 
IBM PC computer interfaced to the Microcal MC2 
**^^ent A scan rate of 1 degAnin was used in all 
^riments. The computer interface and data collection 
'oltware were supplied by Microcal.. Multiple, repetitive 
^ were performed on the same sample to check for 
•^ttsibility, with identical cooling and equilibration times 
^een scans. ~- 
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The DSC raw data, in the form of heat flow (mcal/min) 
as a function of temperature, was transferred to a Macintosh 
Quadra computer for analysis. The raw data were converted 
to excess heat capacity (kcal/deg-mol) by dividing each data 
point by the scan rate and the concentration of protein in 
the sample cell. All baselines were corrected by subtraction 
of DSC scans of the buffer against which the protein had 
been dialyzed. The heat capacity data was fit by using in- 
house nonlinear least-squares fitting routines to obtain the 
midpoint temperature of the transition and both the calori- 
metric and van't Hoff enthalpies. The basis of the programs 
has been described elsewhere (Shriver & Kamath, 1990). 

Fluorescence. Fluorescence titration measureriients were 
performed on an SLM 8(X)0C spectrofluorimeter with 4 nm 
excitation and 8 nm emission slit widths. Binding titrations 
were performed with excitation at 295 nm and emission 
monitored at 350 nm. Reverse titrations were performed by 
adding aliquots of concentrated nucleotide solutions to a 
known concentration of protein in a 4 mL fluorescence quartz 
cell with stirring using a magnetic "flea" within the cell. 
Nucleic acid concentrations were determined spectropho- 
tometrically using an extinction coefficient of 84(X) lV(cnnnol) 
for poly[dGdC]-poly[dGdC] (Wells, 1970) and 66O6 
L/(cm-mol) for poly[dAdT]-poly[dAdT] (Inman. 1962). All 
experiments were performed at 25 °C. The fluorescence 
intensity was constant at high DNA concentrations, and thus 
no correction was made for the inner filter effect Appar- 
ently, any decrease in fluorescence due to the irmer filter 
effect was balanced by other effects, such as scattering by 
the DNA-protein complexes. Photobleaching was not ob- 
served during the titrations. Binding parameters were 
obtained by using a simple, noncooperative McGhee— von 
Hippel model (McGhee & von Hippel, 1974). 

DNA Stabilization. Thermal denaturation studies of DNA 
and DNA— protein complexes were performed on a Cary 210 
spectrophotometer equipped with water-jacketed cuvette 
holders and a circulating water bath calibrated to within ±04- 
°C. Melting curve? are scaled to an A262 of 1.0 at 20 °C for 
the DNA component of DNA-protein mixtures. ' 

Sequence Analysis. BLAST (Altshul et al., 1990) search- 
ing and alignment were performed using the NCBI server 
(blast@ncbi.nlm.nih.gov) against the "nr" (nonredimdant) 
sequence database (mcluding Brookhaven Protein Data Bank, 
January 1994 release; SWISS-PROT Release 29.0, June 
1994; PIR Release 41.0, June 30, 1994; CDS Translations 
from GenBank Release 83.0, Jime 15, 1994, Kabat Sequences 
of Proteins of Irmnunological Interest Release 5.0, August, 
1992; TFD Transcription Factor Database Release-7.6, June 
1993). BLITZ and FASTA searches of the latest SWISS- 
PROT database were performed using the EMBL servers 
(blitz@embl-heidelberg.de and fasta@embl-heidelberg.de). 
Database retrieval was performed using the GDB/Accessor 
(Johns Hopkins University) available from ftp.gdb.org. 
MacPattem (Fuchs, 1991) (fuchs@embl-heidelberg.de) was 
utUized for BLOCKS (Henikoff & Henikoff. 1991) and 
PROSITE (Bairoch, 1992) analysis on a Quadra 700 
(BLOCKS database Version 7.01 was utilized with 2679 
/ entries and PROSITE database version 12.0, June 1994, was 
used with 1021 entries, both obtained from the /NCBI ftp 
site ncbi.nlm.nih.gov.) The MacVector software package 
.(IBI) was utilized for protein secondary structure analysis. 
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Table 1: List of Oligonucleotides * - ' 


oligo- 




nucleotide* sequence* ' ' 


position' 



A NACYTCYTTYTCYTCNCC 230-247 

B GGGAGCTTYAARTAYAARGGNGARGA^ 218-237 

C GGGGTACCRTTRTCRTCRTANGTRAA*' 296-317 

D TCTTAACAAATTATTTTATTr . . . 398-418 

E GCCCTTTATACCTTCCCCITA : ^ 398-418 

F CCTGTCTTACCArrGTCGTC . - 305-324 

_ . G CCTTCACCATATG AGGTCAAGTTATC 1 87-2 1 2 

H GACTTAACTTAATACXXj • • 143-159 

* Oligonucleotides A, B. and C were derived from amino acids 9- 14, 
5—11, and 31—38, respectively, of the Sac7 proteins (Rgure 1). These 
amino acid sequences are identical in the four Sac7 proteins. * N = A, 
G, C, or T; Y = C or T; R = A or G. *■ Nucleotide positions correspond 
to those in Figure 3. Sequences of oligonucleotides A, C, D, E, F," 
and G are complementary to the sequences shown in Figure 3^--- 
Oligonucleotides D and E correspond to the same positions (Figure 3) 
for sac7d and sac7e, respectively. ^ Oligonucleotides B and C have 
six and four additional nucleotides, respectively, at the 5' termini which 
are not derived from the amino acid sequence of the protein. ' Sequence 
of the primer used for oligonucleotide directed mutagenesis. The 
underlined G replaces a T in the sac7d gene sequence creating an Ndel 
restriction site. , 



RESULTS 

Gene Cloning and Sequence. Pstl digested genomic DNA 
of 5. acidocaldarius RGJM was shotgun cloned in the vector 
pUC19 and transformed into E, coli, DHSoFlQ. Ap- 
proximately 10 000 transformants were screened by colony 
hybridization to a mixed oligonucleotide probe (oligo- 
nucleotide A, Table 1) derived from residues 9—14 of the 
published amino acid sequence of the 5. acidocaldarius 7 
kDa proteins (Kimura et al., 1984; Choli et al., 1988a). [The 
published amino acid sequences for Sac7a, b, d, and e are 
identical over this range (Figure 1) as well as over the ranges 
for oligonucleotides B and C] Tentative positive clones 
were restreaked onto selective media and screened a second 
time with the same probe. Plasraids isolated from a number 
of these positive clones were then independently hybridized 
t o three different mixed probes (olij; onucleotide s A, B»_aDd 
Q, Table H by dot blot hybridization. Two clones were 
isolated which hybridized to all three probes. Piasmids 
isolated from these cells were partially sequenced using 
oligonucleotide B as a primer. One of the genes cor- 
responded Avith the published protein sequence for the 
carboxy-terminfiJ half of the Sac7d protein of S, acidocal- 
darius (Kimura et al., 1984; Choli et al., 1988a) with the 
exception of one; additional lysine at the carboxy terminus, 
and the other corresponded to the Sac7e sequence. The genes 
which matched the Sac7d and 7e proteins have been 
designated sacld and sacle^ respectively. ' 

Agarose gel analysis of the piasmids carrying the sacld 
(p\JCl9/sac7d) and sac7e (p\]Cl9/sac7e) genes indicated 
that the cloned Pstl fragments were greater than 15 kb in 
size. Southern blot hybridizations of oligonucleotide C to 
the restriction digests of p\JCl9/sac7d indicated that sac7d 
gene was present on a slightly less than 8(X) bp £coRI 
fragment Preliminary sequencing of pUC19Aac7^ using 
oligonucleotide B as a primer indicated the presence of an 
EcoRl site 61 bases downstream of the termination codon 
of the protein. Since the published sequence of SacTd protein 
consists of 64 amino acids (Kimura et al., 1984; .Choli et 
al., 1988a), the second £coRI site was expected, to be 
upstream of the start codon. Thus, the £coRl fragment 



hybridizing to probe C was expected to contain the 
coding region of the gene. This EcoRl fragmen 
subcloned in the vector pBIuescript KS-f to produce f 
script KS+/sac7d, and the sequence of sac7d geir 
determined (Figure 3). The sequence of the sac7e 
(Rgure 3) was obtained directly from the p\JC\9/sac7e 
primers complementary to the coding region of the ^ 
The GenBank accession numbers for the sac7d and 
gene sequences reported here are M87569 and LC 
respectively. . . 

Sequence Analysis and Gene Copy Number. The s 
transcription for both sac7d and sac7e genes was deter 
using primer extension analysis (Figure 4). Specific pi 
(ohgonucleotides D and E, Table 1) that were complem 
to residues 398—418 (Figure 3) of the two genes were 
A single start site was observed for each of the two 
which occurs on a guanosine residue eight nude 
upstream from the initiation codon. These guai 
residues are present within perfect archaeal "B box' 

A A ^ 

sensus sequences (consensus -TG- (Zillig et al., 191 
sequence resembling the archaeal "A-box" motif (con* 

TTTAjA) is seen 24 and 23 nucleotides upstream fro 
transcnption start site for the sac7d and sac7e \ 
respectively (Figure 3). The "A-box" of sac7d has 
base match with the consensus sequence, while that f 
sac7e has only four matches. 

Oligonucleotide F (Table 1) was used to probe gei 
blots of three 5. acidocaldarius (RGJM, DG6, and DS> 
and two S. solfataricus (DSM5354 and P2) strains (I 
5 A). Oligonucleotide F is complementary to a region c 
for residues 34—40 (Figure 1) which are identical for ; 
S, acidocaldarius 1 kDa proteins (DDNGKTG) and si 
cantly different from that of S, solfataricus (DEGGG 
two substitutions and an insertion). Two HindJR restr 
fragments ('^S.O and ^^4.6 kb) were recognized by the 
in all three S: acidocaldarius strains, while no hybridL 
to the 5. solfataricus strains was observed. This obser 
reinforces the assignment of the RGJM strain (our labo 
strain) as an 5. acidocaldarius strain. The results in 
that the putativQ genes encoding all of the Sac7 protei: 
present on the two HindJR restriction fragments of ^^3. 
~4.6 kb in size. Genomiic blots of £toRI, //iVjdin, an 
digested S, acidocaldarius RGJM DNA were also p 
with the common oligonucleotide F (Hgure 5B), and ii 
case hybridization to two bands was observed. One 
in each hybridized to oligonucleotide H, specific f 
imtranscribed region upstream of the sac7d gene (Hgur 
Results of the hybridizations of various restriction d 
of the original j>UC/sac7d and pUC/sqc7e clones t 
propriate oligonucleotides (data not shown) corroborat' 
results in Figure 5 and also indicated that the original ( 
had a single copy of a sac7 gene. The 3.0 and 4.6 kb H 
fragments can be correlated with the sac7d and sac7€ | 
respectively. The data indicate that there are only twc 
genes in 5. acidocaldarius genome, each being preser 
single copy. This reinforces the conclusion that Sac7 
Sac7b are proteolytically truncated versions of the i 
protein, - .v-.ij ■ ; 

i- Protein Sequence Analysis, the sac7d open reading 
can encode a 66 amino, acid proteii^ with a calci 
molecular weight of 7608, and the 5flc7c'encodes a 65 j 
acid protein with a calculated molecular weight of 



NACYTCYTTYTCYTCNCC 230-247 

GGGAGCTTYAARTAYAARGGNGARGA^ 218-237 

GGGGTACCRTTRTCRTCRTANGTRAA*' 296-3 1 7 

TCrTAACAAATTATnTATTT . . . 398-418 

GCCCnTATACCTTCCCCITA : 398-418 

CCrOTCTTACCArrGTCGTC . - 305-324 

CCTTCACCATATG AGGTCAAGTTATC 1 87-2 1 2 

GACTTAACTTAATACCG • 143-159 
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Sac7a 
Sac7b 
Sac7d 
Sac7e 
Sso7d 



Sac7a 
Sac7b . 
Sac7d 
Sac7e- 
Sso7d 



Val -Ly s -Val -Ly s * - Phe-Ly s * -Ty r-Ly s -Gly-Glu-Glu -Ly s-Glu -Va 1 -Asp- 
Val-Lys-Val-Lys*-Phe-Lys*-'iyr-Lys-Gly-Glu-Glu-Lys-Glu-Val-Asp- 
Val- Lvs-Val -Lvs* -Phe-Lys*-'iyr-Lys-Gly-Glu-Glu-Lys-Glu-Val-Asp- 
Aia lLvs- Val ^^^g^ -Phe-Lys*-Tyr-Lys-Gly-Glu-Glu-Lys-Glu-Val-Asp- 
Ala-lto 1Val-IS^-P>^e-Lys*-Tyr-Lys-Gly-Glu-Glu-Lys-Glu-Val-Asp" 



16 



20 



25 



30 



Thr-Ser-Lys-Ile-Lys-Lys-Val-Trp-Arg-Val-Gly-Lys-Met-Val-Ser- 
Thr-Ser-Lys-Ile-LysrLys-Val-Trp-Arg-Val-Gly-Lys-Met-Val-Ser- 
Thr-Ser-Lys-Ile-Lys-Lys-Val-Trp-Arg-Val-Gly-Lys-Met-Val-Ser- 
a^r- Ser-Lys-Ile-Lys-Lys-Val-Trp-Arg-Val-Gly-Lys-Met;Val;-Ser- 
ne^Ser-Lys-Ile-Lys-Lys-Val-Trp-Arg-Val-Gly-Lys-MetfllefSer- 



31 



35 



40 



45 



Sac7a 
Sac7b 
Sac7d 
Sac7e 
Sso7d 



Sac7a 
Sac7b 
Sac7d 
Sac7e 
Sso7d 



Sac7a 
Sac7b 
Sac7d 
Sac7e 
Sso7d 



Phe-Thr-Tyr-Asp-Asp-Asn-Gly 
Phe-Thr-iyr-Asp-Asp-Asn-Gly 
Phe-Thr-Tyr-Asp-Asp-Asn-Gly- 
Phf>-Th-r-Tyr-AsP -AsD-Asn- Glv- 
Phe-Thr-Tyr-Asp| -Glii-Gly 4Gly4 Gly 



Lys-Thr-Gly-Arg-Gly-Ala-Val-Ser- 
Lys-Thr-Gly-Arg-Gly-Ala-Val-Ser- 
Lys-Thr-Gly-Arg-Gly-Ala-Val-Ser- 
Lys-Thr-Gly-Arg-Gly-Ala-Val-Ser- 
Lys-Thr-Gly-Arg-Gly-Ala-Val-Ser- 



55 



60 



46 50 

Glu-Lys-Asp-Ala-Pro-Lys-Glu-Leu-Leu-Asp-Met-Leu-Ala-Arg^Alaj 

Glu-Lys-Asp-Ala-Pro-Lys-Glu-Leu-Leu-Asp-Met-Leu-Ala | 

f:ln-TyR-Asp-Ala-Pro-Lvs-Glu-Leu --Leu-- Asp-Met-Leu"Ala-Arg-Aia- 

r;in-TyR-Asp-Ala-Pro-Lvs-Glu-Leu fMet-|Asp- Met-Leu -Ala-Arg-Ala" 



61 






65 


Glul 








Glu- 


-Ara -Glu - 


-Lys- 


(Lys)l 


Glu 


Lys*-Lys* 


-Lysl 


-Lys*| 


Glu- 


•Lys*-Gln 


-Lys* 



36^ 



Figure 1: Amino acid sequences of the Sac7a, b, d, and e proteins [after Kimura et al. (1984) and Choli et al. (1988b)l and the Sso7d 
proiein [after Choli et al. (1988a)]. [Note that the sequence reported by Kimura et al. (1984) was claimed to be for Sso7d but was later 
shown to be for Sac7d (Choli et al., 1988a).] Numbering is according to the Sac7d sequence without the mitiator methionine. Regions 
homologous to the Sac7d protein arc outlined. Sac7a, b, and d differ only in length. Lysines which arc monomethylated to some extent m 
ihc native protein are indicated with asterisks. The additional C-terminal lysine coded by the sac7d gene descnbed here which was not _ 
indicated in the published protein sequence is enclosed in parentheses. , ; . ^ 

^ 2 3 Gly43 to Ala59. Only the (^ou-Fasman algorithm predicts 

a small amount of ^-sheet (12%) extending from Lys22 lj>/V 
Lys29andfromSer31 to Asp36. Reverse ,tums are predicteci 
near Asp36 and Gly43. These predictions are not consistent 
with the solution structure of tiie Sac7d,j>rotein which has 
been determined by 2D NMR (Edmondsori, C^u^ and Shriver, 
manuscript submitted). . . . 

, Recombinant Gene Expression, The sac7d gene (in 
pBluescript YiS-\-lsac7d) was modified by converting the 
hexanucleotide sequence containing the initiation codon 
(AATATG) to an Niiel site (CATATG) by oligonucleotide - 
G (Table 1) directed mutagenesis to produce pBluescript 
KS+/5ac7J(Nd). The Ndel-BomHI fragment of pBluescript 
KS+/jac7d(Nd) carrying the coding region of sacld gene 
was then subcloned into the Ndel~BamHl site of pET-3b 
(Studier et al.. 1990) to give pET-3b/jac7^f,.and transformed 
into HMS174 (DE3), HMS174 (DE3) pLysS, BL21 (DE3), 
and BL21 (DE3) pLysS (Studier et al., 1990). The plasmid 
could be established in all of these strains except BL21 
(DE3). Furthermore, in transformed BL21 (DE3) pLysS,^. 
the growth of the organism is impaired and cultures lyse' 
within 60-70 min after induction with IPTG. On the other 
/ hand, the growth of HMS174 strains were not significantly 
effected by the presence of the plasmid, and lysis^was not 
/ observed in cultures after 3 h postinduction. The absence 
of impaired growth in the presence of the plasmid in these 




Figure 2: Schagger and von Jagow (1987) polyacrylamide 
nonrcducing SDS gel of purified native Sac7 proteins (lane 1), 
itcombinant Sac7d Qane 2), and native Sso7 Oane 3) proteins 
suined with Coomasie Brilliant Blue G-250 (Bio-Rad). The 
molecular weight of the Sso7 protein is 7019 based oii the published 
protein sequence (Choli et al., 1988a). while that of the Sac7d is 
7608 based on the DNA sequence presented here. The band 
positions of myoglobin (MW 16 900) and insulin (MW 5780) arc 
indicated for comparison. . . . / 

fincluding initiator methionines). . Secondary structure analy- 
sis of the sequences of the Sac7d and Sac7e proteins was 
performed with both the Chou-Fasman (Chou & Fasman. 
1974, 1978) and the Robson— Gamier algorithms (Robson 
& Suzuki, 1976; Gamier et al., 1978). Both methods predict 
^ occurrence of significant a-helix (52%) in both proteins 
extending from ^proximately Lys9 to Lys28 and from 
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sac 7d 
Sac 7« 



GAATTCTTAT 



SI 



101 



151 



301 



351 



401 



451 



501 



551 



GTTCrATAGCGTAATTArai^GTTGTATAACTCOT 
CTTAGACGACAAACCIXnAAATACn'ATAGTAAATAATGCTATAAATC^ 

TATATTICAATATTACTAATTATTGTACTGGATTCC^ - 
AT^TGGTACTCCrCAGATAAATTTCACAAAAGTTAGGGCTAT^^ 

ACATTATATAGGAAAAATAATTTCyiGGTAGTClCATAAGTATGACTTAAC * 
TAAATIXrrAATCrrcATACriWlTGATATITGGATArrAATGT;^^^ 

(A-box) - - * - ■ " ■ _• T*■^ 

TTAATACCGTAAG(5ilimTATCy^CAATATCGTAAGATAA^ - ' 

tT&TTAATC^TAAT ATTAATr AATGGCGAATTTAAGATATACAIS^^ 

M- y K V X F K y VK^ CTfe E K E V D , 
ATATOC^ISAAGGT&AACnTCAAGTATAAGGGTGAAGAGAAAG . / 

ATATXXXIMA&tnCAfiGTTlAAGTATAAGGGTGAAGAGAAAGAAGTAG^ 

H h KV B F K Y K G B E K EV D 

TSKIKKVWRVGKM V S F T 

acttcaaagataaagaaqgttiggaga gtag <x:aaaatggt^ ^ 
acttcaaagataaagaaggtctggagagttggcaaaatggtgtcctttac \ 

T S K I K K V W R V G K M V S F ..T; 

yddngktgrgavs ei^^k D . 
ctatgacgacaatggtaagacaggtagaggagctgtaagcgagaaagaig 
ctatgacgacaatggtaagacaggtagaggagctgtaagcgamaagacg . 

yddngktg r g a V S-E K D 

. "Si^'y ' 



sac7d 
complementary 
strand 



* McAfee et a] 



sac7e 
complementar 
strand 



sac7d 



sac7e 



CTCCAAAAGAAXrAmGAaUXa?^^^ 

ctccaaaagaacta&tggacatgttagcaaga^^ N 

DMLARA EKKK stop 



A P K E L M 



stop •- 

TAAAATAATTKrrTAAGAAAATCTTCATATAAAT Tvrrri-l i ATl ja^^ 
GGGGAAGGTATAAAGGQgrTTITAAATGTCAAAAU--l-i-rriAlXJ^^ 

TTTTAATTTATTAGAATTC . . . '. 

GCATTTCAACTTrAGAAGATCTTTTATAATAGCCTAAATTTCT^^ 



GGAGTTTTTCcicrATTC^ 



AGTATT V' 

Figure 3: Nucleotide sequences of the sac7d and sac7e genes. 
The lop and bottom sequences are the nucleotide sequence for the 
sac7d and sac7e genes, respectively (aligned using the coding region 
of each gene). Numbering starts with the sac7e sequence. The amino 
acid sequence coded for by each gene is shown above {sac7d) or 
below (sac7e) each nucleotide sequence. Putative promoter (A- and 
B-boxes) and termination elements are underlined in the 5' and 3' 
noncoding regions of each sequence. Amino acid and nucleotide 
differences in the coding region of each gene are also indicated by 
underlines. The G at the start of transcription (in the B-box) for 
each gene is indicated with an asterisk. . - • *^ • ^ ' _ - 

strains was correlated with a lack of Sac7d proteifi ac- 
cumulation. In contrast to HMS174 strains, BL21 and its 
derivatives lack the ompT outer membrane protease and are 
deficient in the tonA protease (Studier et al., 1990): The 
ompT protease has been shown to be responsible for T7 RNA 
polymerase degradation during protein purification from £. 
coll (Grodberg & Dunn, 1988). Thus, it appears that in the - 
absence of stringent regulation of T7 RNA polymerase 
synthesis prior to induction with IPTG. or proteolytic 
degradation of the Sac7d protein, the protein accumulates 
to lethal levels. However, because significant amounts of 
the Sac7d protein do not accumulate in HMS174 strains, we 
have utilized BL21 (DE3) pLysS for subsequent expression 
and purification of the protein. • - 

Spectroscopic and Chemical Characterization. The UV 
spectra of native and recombinant Sac7 prcjteins were 
essentially identical, as expected, given the presence of a 
single tryptophan and two tyrosines and two phenylalanines 
in all proteins. The calculated extinction coefficient based 
on amino acid composition is 1.05 mlV(mg^nn) at 280 nm, 
in good agreement with the value of 1,03 mL/(n)g^:m) 
determined by nirihydrin analysis. The extinction coef- 
ficiehts were also determined by using the ratio of absoibance 
at. 280 and .205 nm (see Materials^^d Methods). -The 




Figure 4: Etelermination of the in vivo start of transcription i 
the sac7d and sac7€ genes by primer extension analysis. sac7d (U 
d) and sac7e Qane e) specific oligonucleotides D and E, respectivi 
[which are complementary to residues 398-418 (Figure 3)1. w. 
used to prime the synthesis of a complementary strand of Dl 
from total S. acidocaldarius RNA. These same oligonucleoti( 
were also primers in the dideoxy sequencing reactions used 
markers for the sdc7d (pBSKS+/5flc7£f) and sac7€ genes (pUC 
sdc7e) indicated. The sequences written on the left and right 
complementary to the ones observed in the auloradiogram in 
marked region. The start of transcription is indicated in e; 
sequence by an asterisk. The first five coded amino acids of e 
protein are also indicated along side each complenientary stn 
sequence. . : . : 

empirical nature of this method might lead to some quest 
of its accuracy, but the high correlation of the results fir 
the six standards is extraordinary (r = 0.999), and 
reproducibility of the A280M205 ratio measurement is h 
leading to an expected error of 0.6%. The ratio metl 
demonstrates that the extinction coefficients of the nat 
and recombinant protein arc identical, viz., the mean of 
extinction coefficient measurements (native and recombir 
combined) using this method was 1.18 mL/(mg^nn) wii 
standard deviation of 0.008 mL/(mgOTi). The final exti 
tion coefficient for both the recombin^t and native proto 
is taken to be 1.09 mL/(mg^:m), the mean of the tl 
independent measurements, with a standard error of ±( 
(calculated by propagating the errors of the three meas 
ments). The extinction coefficient was shown to be 
independent froni 2 to 10. _ . . - r* ; ' ; : * , .; — . ■ ~ • 
-y The fluorescence excitation aind emission spectra of 
native Sac7 and recombinant Sac7d proteins were 
essentially identical (data not ^shown). In addition, 
fluorescence emission spectrum was essentially that expe 
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Figure 5: Southern analysis of Sulfolobus genomic DNA. (A) Autoradiogram of a Southern blot of ////idin digests of genomic DNA from 
5. acidocaldarius (RGJM) Oane 1), 5. acidocaldarius (DG6) Oane 2). 5. acidocaldarius (DSM639) Oane 3)» S. solfataricus (DSM5354) 
(lane 4), and S. solfataricus (P2) Oane 5) probed with oligonucleotide F. The approximate sizes of the restriction fragments hybridizing to 
oligonucleotide F are indicated. (B) Autoradiogram of a Southern blot of £c<?RI flane E), HinWl Oane H), and Pstl (lane P) digested 5. 
acidocaldarius RGJM genomic DNA hybridized with oligonucleotide F. Two closely spaced bands in lane P are clearly evident in the 
original autoradiogram. Lane E* is a second independent £coRI experiment to clearly demonstrate the 0.8 kb fragment. (C) Similar to 
panel B except that the DNA was probed with oligonucleotide H. 

at 4,7 ppm indicates the presence of significant ^-sheet 
structure (Wishart et al., 1992). The wide chemical shift 
dispersion has permitted an essentially complete assignment 
of the proton resonances and determination of the solution . 
structure OEdmondson, Qiu, and Shriver, manuscript submit- 
ted). 

No phosphorylation or glycosylation of either the native 
or recombinant proteins could be detected. The recombinant - 
protein differs from the native by containing the initiator 
methionine. The recombinant protein also contains an 
additional C-terminal lysine which was not reported in the 
amino acid sequence O^J^mura et al., 1984), although it 
remains to be determined if this is an error in the protein 
sequence or if the lysine is actually removed posttransla- 
tionally. : . " y - 

DNA Binding. The binding of Sac7 proteins to 
associated with a significant quenching of the intrinsic 
fluorescence of the single tryptophan (Trp23) in both the 
native and recombinant Sac7 proteins (Figure 8). Binding 
of poly[dGdC]-poly[dGdC] in 0.01 M KH2PO4 at pH 7.0 
leads to a maximal fluorescence quenching of the native 
protein by 88% and the recombinant Sac7d protein by 87%. 
Poly[dAdT]"poly[dAdTl shows a maximal quenching of 84% 
for both proteins (data not shown). The binding data can 
be fit using the McGhee and von Hippel model (McGhee 
and von Hippel, 1974) without' cooperatiye interactions 
assuming a linear relationship between fractional quenching 
and protein binding. The poly[dGdC]T>oly[dGdC] data can 
be fit with an intrinsic association constant of 2 x 10^ M"* 
for both native and recombinant Sac7d protein and site sizes 
of 7 bases (3.5 base pairs) and 6.8 bases for native and 
recombinant protein, respectively. Poly[dAdT]'poly[dAdT] 
appears to bind slightly weaker with an association constant 
of 1 X 10' M"* for both proteins and site sizes of 7-5 bases 
for native protein and 6.8 bases for recombinant protein. 

The binding of Sac7 to poly[dAdT]"poly[dAdT] signifi- 
/ cantly stabilizes the DNA double helix against thermal 
denaturation. The UV melting curve of poly[dAdT]-poly- 
/ [dAdT] in 0,01 M KH2PO4 is very sharp and has :a of 
43.5 ^'C (Figure 9). In the presence of native Sac7d protein. 



Figure 6: Circular dichroism spectra of native Sac7 (solid line, 
0.26 mg/mL) and recombinant Sac7d (dashed line, 0.66 mg/mL) 
proteins in 0.01 M KH2PO4. pH 7.0. 

■ - ........... "... . ; 

for a free tryptophan, indicating that the single tryptophan 
is highly solvent exposed in both proteins. Notably, the 
fluorescence emission spectra show a small shift upon 
DNA binding (data not shown), indicating that the exposure 
of the tryptophan changes slightly upon DNA binding. The 
CD spectra of native Sac7 jand recombinant Sac7d proteins 
>vere also essentially identical (Hgure 6). The variable 
selection method of Johnson (Manavalan & Johnson, 1987) 
mdicates that both the native and recombinant Sac7 proteins 
are composed of 31% helix (both a- and 3io-helix), 22- 
25% ^-sheet, 0-2% turn, and 42-45% nonrepetitive struc- 
ture. • . . : . ■ 

The DQF-COSY spectra of the native and recombinant 
Sac7 proteins are remarkably similar (Figure 7). The native 
spectrum shows some additional correlation peaks, most 
likely due to the presence of 7a, b, c, d, and e isoforms in 
the native preparation and posttranslational modifications 
(e.g., monomethylation of lysines) in Sulfolobus, The 
wsential identity of the chemical shifts for the native and 
recombinant proteins indicates again that the recombinant 
and native proteins are folded similarly. The extensive 
number of alpha protons shifted downfield of ihe water line 
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FiGURE 7: Double-quantum filtered (DQF-COSY) a to amide 
proton correlation spectra of the native Sac7 (A) and recombinant 
Sac7d (B) proteins at 35 °C in 90% H2O/10% ThP, pH 4.1. The 
protein concentrations in both spectra were approximately 10 mM. 

the melting profile oif poly[dAdT]-poly[dAdT] broadens and 
the Tin increases. At the highest protein concentration used 
in this series of experiments, the DNA melting temperature 
was increased about 33 above that of polyCdAyTJ-poly- 
[dAdT] alone. The recombinant protein increases the Tja of 
poly[dAdT>poly[dAdT] by a similar amount. However, the 
recombinant protein differs in that it aggregates as the double- 
stranded poly[d(AT)] melts. CD measurements of /he 
suspension, and the supernatant after aUowing the aggregate 
to settle, indicate no major conformational changes during 
aggregation of the protein— DNA mixture. „ , , 
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Figure 8: Reverse titrations of the native Sac7 (solid circles) a 
recombinant Sac7d (open circles) proteins with poly[dGdC]'po 
[dGdC] at pH 7.0 (0.01 M KH2PO4). 25 °C with 6.6 fiU Ss 
proteins and 7.3 //M Sac7d. The smooth curves through the d 
are overlays of simulations using a noncooperative McGhee-v 
Hippel model (McGhee & von Hippel, 1974). For the native Sa 
proteins this corresponds to a site size of 7 bases (3.5 base paii 
maximal quenching of 88%, and an intrinsic association const 
of 2 X 10'' M"^ For the recombinant Sac7d protein this correspor 
to a site size of 6.8 bases (3.4 base pairs), maximal quenching 
8755s-and an association constant of 2 x lO'' M"*. 



1.5 




1.0 



20 



30 



40 



50 



60 



70 



80 



90 .1( 



Temperature (*C) 

FIGURE 9: Themial denauiration of poly[dAdTlTX)ly[dAdT] moi 
tored by changes in UV absorbance at 262 nm in 0.01 M KH2PC 
pH 7.0. The melting of poly[dAdT]*poly[dAdT] is shown alo 
(open triangles), with native Sac7 proteins (solid circles), and wi 
recombinant Sac7d (open circles). The concentration of pol 
[dAdTlTX)ly[dAdT was 70 //M (nucleotides), and the concentrati< 
of protein wais 350 fjM. ^ 

Thermal Stability. Sac7 proteins are highly thermostab' 
as expected from their origin. Native Sac7 and recombina 
Sac7d samples heated to 100 °C showed no precipitation 
cloudiness, although some increase in scattering was notic 
able in the UV spectrum. The proteins unfold reversibly 
indicated by the observation of similar endothemis wi 
repetitive DSC scans up to 100 °C. 

The native Sac7 proteins show a DSC endotherm at p 
6.0 (0.01 M KH2PO4. 0.1 M KQ, 0.001 M EDTA) with 
of 99.0-100.2 **C (data not shown). By comparison. H 
native Sso7 protein has a Tn, of 99.4 °jCl under siniil 
conditions (data not shown). A precise midpoint for tl 
unfolding transition is difficult to define since data abo^ 
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Figure 10: Differential scanning calorimetry (DSC) of native Sac7 
(solid circles) and recombinant Sac7d (open circles) proteins at pH 
4.0 (0.3 M KQ, 0.05 M potassium acetate). Protein concentrations 
were 1.5 mg/mL of native Sac7 proteins and 1.38 mg/mL of 
iccombinant Sac7d. Smooth curves through the data are nonlinear 
least-squares fits with = 80.3 A/Zd = 53.0 kcal/mol, A//vh 
= 49.6 kcal/mol. for the recombinant protein; and Tjn = 86.8 °C, 
A//cj = 56.4 kcal/mol, A//vh = 60.3 kcal/mol for the native protein. 

100 cannot be collected in water in the MC2 calorimeter. 
Notably, the unfolding of the native Sac7 proteins is 
remarkably reversible, as indicated by essentially 100% 
reproducibility of successive scans on the same sample 
following cooling. The recombinant Sac7d protein unfolds 
ai pH 6.0 (0.01 M KH2PO4, 6.1 M KCl. 0;001 M EDTA) 
with a Tm of 92.7 °C, or approximately 7 °C less than the 
native. 

A reliable analysis of the DSC endotherms requu^es a more 
complete delineation of the endotherm which can be obtained 
by lowering the pH and increasing the salt concentration to 
shift the endotherms to lower temperature. At pH 4.0 (0.05 
M potassium acetate. 0.3 M KCl) the native protein unfolds 
^ith a of 86.8 °C (Figure 10). The endotherm can be fit 
u-ith a van*t Hoff enthalpy of 60.3 kcal/mol and a calori- 
metric enthalpy of 56.4 kcal/mol, i.e., a A//cai/A//vh of 0.94, 
indicating that the native protein exists as a monomer under 
Aese conditions and unfolds in an all-or-none fashion with 
no significant, populated intermediates. l-^'T' . \' *_ 

The recombinant Sac7d protein similarly unfolds reversibly 
ai pH 4.0 (0.05 M potkssium acetate. 0.3 M KQ) but with 
a midpoint temperature' of 80.3 '^C (Figure 10), or 6.5 °C 
less than the native protein. It unfolds with a van't Hoff 
enthalpy of 49.6 kcWmol. and a calorimetric enthalpy of 
53.0 kcal/mol, i.e., a A//cai/A//vh of 1.07. The identity, within 
experimental error, of the calorimetric and yan't Hoff 
enthalpies indicates that the recombinant protein also exists 
: as a monomer under these conditions and unfolds via a two- 
state reaction. _ . . _ ; - N . 

DISCUSSION - • V; -;;^ y 

We report here the cloning and sequencing of two genes 
frwn 5. acidocaldarius coding for^ Sac7 proteins which 
correspond to Sac7d and Sac7e. The sac7d and sac7e ^enes 
<MFer at only 16 positions within the coding region (under- y 
tned in Figure 3); three of these differences are transversions,' 
*hile the rest are transitions. The 5ac7J and ^ac7e genes , 
code for 66 and 65 amino acid proteins, respectively. The 
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deduced amino acid sequences are in complete agreement 
with the published sequences for both proteins (Kimura et 
al., 1984; C:holi et al., 1988a) with the exception of initiator 
methionines at the amino termini and an additional lysine 
(Lys66) at the carboxy terminus of the Sac7d protein in the 
deduced sequence. The additional lysine can be explained 
either by a failure to discern the fmal lysine in the amino 
acid sequencing of the Sac7d or by posttranslational carboxy- 
terminal processing to produce the mature protein. It should 
be noted that Sac7d, Sac7e. and Sso7d all terminate with at 
least two lysine residues (Figure 1). 

The data presented here indicate that there are only two 
Sac7 protein genes in S. acidocaldarius. Genes coding for 
Sac7 proteins other than Sac7d and e could not be detected. 
The failure to detect genes for the Sac7a and b proteins and 
the fact that the proteins appear to be simply truncated at 
the carboxy termini to various extents suggest that Sac7a 
and b result from either posttranslational modification at the 
carboxy terminus or by proteolysis during protein isolation 
and purification, v. . 

Promoter elements consistent with the archaeal "A-box** 
and "B-box" consensus sequences have been located up- 
stream of the sac7d and sacle protein coding sequences. The 
agreement of the "A-box" sequence of sacld with tbe 
consensus "A-box" sequence is greater than that for the 
sacle. This difference between the " A-box" of the promoter 
elements in the two genes may explain the higher levels of 
Sac7d relative to Sac7e in vivo (Grote et al., 1986). 

There is significant sequence similarity in the regions of 
sacld and sacle extending fi-om the 5' end of the "A box" 
to the initiation codon when the corresponding "A-" and "B-" 
boxes are aligned. The two sequences also have similarly 
placed pyrimidine rich regions downstream of their termina- 
tion codons. These regions show similarity to the transcrip- 
tion termination signals described for the Sulfolobus virus- 
like particle. SSVl, where transcription termination has been ' 
shown to occur within pyrimidine-rich regions directly 3' 
of the consensus TTTTTYT [reviewed in Brown et al.. / 
(1989)]. Northem analysis of S. acidocaldarius RGJM RNA' 
probed with an oligonucleotide (oligonucleotide F, Table 1) 
complementary to the common sequence at residues 305— 
324 of the two sac? genes (Figure 3) showed hybridization 
to a single size of transcripts (Shao and Gupta, unpublished 
results), indicating that botfi transcripts terminate in similarly 
placed regions. Thus, it is likely that the conserved oligo- 
pyrimidine sequences of the two genes contain the transcrip- 
tion termination signals. . 

Although the regions associated with transcription termi- - 
nation are highly homologous, the sequences between tfiese 
regions and the termination codons are significandy different 
in the sacld and sacle genes. Similarly, though the regions 
encompassing the putative core promoter elements in the two 
genes ("A-" and "B-" boxes) share extensive homology, the 
sequences 5' of the "A-box" show less similarity, it would 
appear that sufficient time has elapsed since the supposed 
original gene duplication for the two sequences to diverge. 
The conservation of cis-regulatory . elements along with 
coding regions in the two genes indicates that there is ^ 
selective pressure to maintain not only the expression of both 
gene products but also a large part of their sequence. It is 
not clear if there is more than T)ne form of the Sso7 proteins. 

A typical ribosome binding site sequence upstream of 
initiator ATG is not observed in either of the two sad genes 
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Figure 11:- Potential secondary structures for the S'-terminal 
regions of the sac7 RNA transcripts determined using Mulfold 
(Jaeger et al., 1989a,b; Zuker, 1989). Initiator codons are shown 
in lower case. Putative ribosome binding sequences GGUGA and 
AGGU are indicated in bold and underlined formats, respectively. 
Note that the AGGU sequences within the two transcripts are ^ 
located at different positions. - • 

(Figure 3). This is not unusual, since many other Sulfolobus 
genes also lack these sites (Amils et al., 1993; Dalgaard & 
Garrett, 1993). However, potential ribosome binding sites 
are observed downstream of the initiator codons of the two 
sac? genes which have precedents in other archaea. The 
ribosome binding sites in certain halobacterial genes, which 
have very short or no 5' untranslated regions, occur within 
loops of potential hairpin structures in the 5' regions of the 
transcripts (Brown et al., 1989; Amils et al., 1993). The 
haiipin arrangement probably exposes these sites for inter- 
action with 16S rRNA. We note that the 5' regions of the 
two sac? transcripts can be folded into secondary structures 
as shown in Figure 11. The sequence UCACCU near the 3' 
end of 16S rRNA of Sulfolobus (Woese et al.. 1984; Olsen 
et al., 1985) potentially can either form five base pairs with 
GGUGA within codons 1—3 or form four base pairs with 
ACKjU within codons 3-4 of the sac7d transcript. Corre- 
sponding sequences in the sac7e transcript are GGCAA and 
AAGU. respectively, which cannot form similar pairs with 
the 16S rRNA. However, further downstream in the sac?e 
transcript, there is ACKjU within codons 5-6, which can 
form four base pairs with the same UCACCU sequence of 
the 16S rRNA; the corresponding site in sac7d is less 
efficient AAGU. Parts of these potential ribosome binding 
sites do occur within single-stranded regions (Figure 1 1), as 
are the cases for the above mentioned halobacterial genes. 
The differences between the sequences and locations of the 
potential ribosome binding sites of the two sac? transcripts, 
along with the^previously mentioned differences in the "A- 
box" sequences, may also explain the higher synthesis of 
Sac7d protein. \ 

Kimura et al. (1984) have previously noted that the 
clustering of lysines in the amino tenminus of these proteins 
is reminiscent of that observed in eukaryotic HMG proteins. 
Choli et al. (1988b) have also pointed out a slight sequence 
similarity with E2A DNA-binding protein from adenovirus. 
An extensive search of the currendy available sequence 
databases showed no significant homologies between the 
Sac7d protein and any known chromatin or DNA-binding 
protein. A BLAST search using the Sac7d sequence picked 
up a 100% homology with the amino-terminal sequence (only 
12 amino-terminal residues are known) of a small protein 
(accession number S21 168) from S. solfataricus which 
apparendy catalyzes disulfide bond formation (Gu^ardi 
et al., 1992). This report should be viewed with caution due 
to the loss of activity upon cation exchange chromatography 
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of the protein. BLAST also picked up a high homolo; 
a reported p2 ribonuclease (Fusi et al., 1993) froi 
solfataricus with a sequence identical to the Sso7d pr 
(Choli et al., 1988a). RNase activity for the 7 kDa pro 
is surprising and remains to be confirmed. Prelimi 
experiments indicate that the recombinant Sac7d protein 
not have RNase activity (Edmondson and Shriver, un 
lished results). The BLAST search also picked up 5 
weak homology with the 30S ribosomal protein S5 fro 
coli (P02356) and heat shock protein X16 from the Ah 
clawed frog (A22175). A FASTA search using the S 
sequence revealed some homology with elongation f; 
1-5 (P29692), 30S ribosomal protein S8 (P24353), and B 
directed RNA polymerase subunit A' (P3 1 813). A PROi 
search using the Sac7d sequence revealed phosphocre; 
kinase phosphorylation sites at residues 17-19 (TSK), 
42 (TGR), and 46-48 (SEK), and creatine kinas 
phosphorylation sites at 33-36 (TYDD), and 46-49 (SH 
A BLOCKS analysis provided a single meaningful m 
with ribosomal S5 protein. 

We have expressed the sac?d gene in the tightiy contr( 
BL21(DE3)pLysS £. coli expression system develops 
Studier et al. (1990) using the pET series of plasn 
Accumulation of the sac?d gene product appears to be It 
in E, coli. This is indicated perhaps most clearly by 
inability to establish the pET-3b/5ac7J construci 
Bt31(DE3). The additional regulation provided by th( 
lysozyme inhibition of T7 polymerase appears to be requi 
The purified, recombinant protein can be isolated ^ 
reasonable yield, e.g., typically, about 1 mg of protein p 
of wet weight E. coli cells is obtained, or approximately n 
that obtained for the native protein from 5. acidocaidat 
We have been unsuccessful in expressing the sac?e gi 
possibly due to its usage of codons rare in E. coli. 

The recombinant Sac7d protein appears to be essenti 
identical to the native Sac7 proteins in all respects ext 
for stability. -The UV spectral extinction coefficients 
identical, as are the fluorescence excitation and emisi 
spectra. This is perhaps not surprising given that both 
largely due to a single tryptophan on the surface of 
protein (Edmondson, Qiu, and Shriver, manuscript submit 
[see also Baumann et al. (1994) for the structiirc of Sso' 
although the two tyrosines should be sensitive to differer 
in structure. CD spectra are more sensitive to differer; 
in secondary structure content, and the spectra of the 1 
proteins are essentially identical, again indicating sim 
structures for native and recombinant protein. 

Analyses of the CD spectra using the variable select 
method of Johnson (Manavalan & Johnson, 1987) indie 
that Sac7d consists of 3 1 % helix and 12-15% )5-sheet. 1 
differs from the 52% a-helix, 12% ^-sheet predicted 
sequence analysis algorithms in this work and the 1 
a-helix, 15% i5-sheet predicted by Choli et al. (1988a) uj 
the average of four different prediction methods. All of th 
methods significanUy underestimate the amount of )3-sl 
in Sac7d (42%) as determined from the NMR solut 
structure (Edmondson, (Jiu, and Shriver, manuscript subr 
ted) [see also Baumann et al. (4994)]. However, the hel 
content determined by CD (31%) is close to that of the Nl 
solution structure (22% a-helix, 11% Sio-helix)! An anal> 
of the CD spectrum of Sac7e (Dijk & Reinhardt, 1986) us 
the PG method (Provencher & Glockner,^81) gave ami 
better estimate of ^-sheet content (44%) but imderestima 
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tfic helical content (15%). The CD spectrum reported for 
Sac7e (Dijk & Reinhardt, 1986) differs quantitatively from 
that of native Sac? and recombinant SacTd presented here. 
Further, the inability of the CD analyses to accurately 
isiimate the secondary structure content suggests that at least 
part of the secondary structure contributions to the CD 
spectra of the Sac7 proteins are not well represented in these 
sets of reference proteins. . . . 

A more detailed, atomic level comparison of the structures 
of the recombinant and native proteins can be obtained from 
NNiR. The "fingerprint" region of double-quantum filtered 
COSY spectra of proteins shows the chemical shift correla- 
tions of alpha and NH protons and is exquisitely sensitive 
10 the structure of the protein [see, for example, Wishart et 
al. (1992)]. This permits a qualitative comparison of the 
stnicture of the backbone of the two proteins which is more 
detailed than that provided by optical spectra comparisons. 
The fingerprint regions of native and recombinant Sac7d 
protein are remarkably similar, indicating that the two 
proteins have very similar backbone folding patterns. 

The binding of the Sac7 proteins to double stranded DNA 
leads to a dramatic decrease in intrinsic tryptophan fluores- 
cence. The large signal allows for essentially noise-free 
titrations and accurate comparisons of, the native and 
itcombinant protein binding function. The data presented 
here indicate an affinity of 2 x W M"* and site size of 3.5 
base pairs for poly[dGdC]-poly[dGdC]. The agreement of 
quantitative binding parameters obtained for the native and 
recombinant proteins is additional evidence for essentially 
identical global folds for the two proteins. These binding 
studies are the fu^t quantitative analysis of the binding of 
the Sac7 proteins to DNA. ' 

Various prior studies of the 7 kDa DNA-binding proteins 
from Sulfolobus have characterized the binding to nucleic 
acids in a qualitative marmer. Electron micrographs of the 
7 kDa proteins from 5. acidocaldarius complexed with DNA 
indicated that the heUx becomes increasingly compacted with 
increasing ratios of protein to DNA (Dijk & Reinhardt, 1986; 
L4irz et al., 1986). Filter binding studies confirmed that the 
7 kDa proteins had an affinity for pBR322 DNA even at 
relatively high salt concentrations (e.g., 0.265 M NaQ) which 
was comparable to that observed for E. coli HU protein 
(Grote et al.. 1986; Choli et al., 1988a). Characterization 
of the affinity for DNA in this work was in terms of percent 
boimd at a specific ratio of protein to DNA. DNA-melting 
studies have also been performed on a small DNA-binding 
protein from S. acidocaldarius^ HSNP-C, with an amino acid 
composition similar to the SacTe protein, although the 
sequence has not been presented The protein increases the 
To of double-stranded DNA (Reddy & Suryanarayana, 1989). 
In addition, this protein demonstrated a significant quenching 
of its inninsic tryptophan fluorescence upon DNA binding, 
jJthough no quantitative analysis of the titrations was 
performed. >' ' 

Baumann et al. (1994) have recently* presented sofne 
fluorescence binding data for the homologous Sso7 proteins 
&om S. solfataricus, A quantitative analysis of the titrations 
*^as not performed, but a visual inspection of the data 
indicates a binding site size for double-stranded DNA of six 
t>ase pairs in low salt (0.02 M Tris, pH 7,4), nearly twice 
^t presented here^ Assuming a site size of 3—6 base pairs, 
the binding affmity in low salt is approximately 0.5 to 1 x 
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10^ The thermal stability of poly[dIdC]-poly[dldC] was 
increased by approximately 40 **C in 5 mM Tris (pH 7.0). 

The unfoldmg of both the native and recombinant proteins 
is reversible, allowing for detailed, accurate characterization 
of the thermodynamics of folding. In contrast to all other 
physical parameters studied here, the energetics of folding 
of the recombinant Sac7d protein differs significantly from 
that of the native Sac7 proteins. The native protein unfolds 
al pH 6.0 at 100 ""C, remarkable given the absence of any 
metal cefaclors or disulfides. Surprisingly, the recombinant 
protein unfolds with a Tm 6.5 X less than the native. The 
lower enthalpy of unfolding of the recombinant protein is 
not surprising and most likely results from a positive heat 
capacity change associated with unfolding. Any shift to 
lower temperature of an endotherm associated with a positive 
ACp will lead to a decrease in enthalpy since : : - 
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It is generally thought that a positive ACp of unfolding is 
due to the exposure of internal hydrophobic residues (Stur- 
tevant, 1977; Privalov & Gill, 1988). The magnitude of the ; 
change observed here is consistent with that observed for 
other globular proteins (Privalov & Gill, 1988). 

Maras et al. (1992) have previously noted that specific 
lysine monomethylation of glutamate dehydrogenase from 
S. solfataricus might be responsible for enhanced thermal 
stability of this enzyme relative to homologous mesophile 
forms. Baumann et al. (1994) have presented mass spec- 
troscopic evidence correlating methylation of the Sso7 - 
protein with growth temperature, and they have suggested 
that such a modification might be related to the stability of 
the protein. The most straightforward way to determine if 
methylation increases the thermostability of the protein would 
be to compare the stabilities of the protein in its methylated 
and unmethylated forms." Demethylation of the native protein 
is not a trivial control experiment given the lack of , , 
commercially available demethylases and most importantly/' 
the specificity of reported demethylases (Paik & Kim. 1980)i 
In the absence of a demethylase, the preparation of 'an 
immethylated form is best accomplished using recombinant 
protein. We have demonstrated here a significaht difference 
iathe thermostability of native and recombinant Sac7 proteirL 
The only knovm difference between these proteins is the 
€-aminomonomethylation of lysines 5 and 7 in the native 
protein and the initiating methionine in the recombinant 
protein. The lack of Lys66 in the reported amino acid - 
sequence of the native protein is presumably; a sequencing 
error, and this will be investigated in the NMR analysis of 
the native protein. No other posttranslational mckiification, 
such as phosphorylation or glycosylation, of the native or 
recombinant Sac7 proteins was detectable. The ciurent 
evidence, thereifore, strongly indicates that Sulfolobus can 
increase the thermostability of some of its proteins by specific 
lysine monomethylation. .. . . 

We note that the level of specific methylation of Sac7 is 
variable and incomplete, i.e., the native preparation is*^ 
heterogeneous (Kimura et al.; 1984; Choli el al., 1988a,b). 
Choli et al., <1988b) report that the degree of monomethyl- 
ation of lysine 4 is 70%, 25%, and 20% in native Sac7a, 
Sac7b, and Sac7d. respectively; while that for lySine 6 is 
50%, 40%. and 50%, respectively. Heterogeneity would be 
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expected to lead to broadening of the endotherm. rather than 
narrowing (see Figure 10). It would appear, therefore, that 
stabilization might not require complete methylation of the 
specific lysines. . : . ; : - ^ . . 

Interestingly, we have been unable to increase the stability 
of the recombinant Sac7d protein by nonspecific, reductive 
methylation (McCraiy and Shriver, unpublished results), a 
process which leads to predominantly dimethylation (Means 

6 Feeney, 1971). Monomethylation changes the pAT. of the 
e-amino group from 9.25 to 10.63, while dimethylation has 
little further effect giving a pAT. of 10.78 (Paik & Kim, 1980). 
Trimethylation returns the pAT, to 9.8. Given the small 
change in p/Ta and the fact the difference is observed even 
at pH 4.0, it is doubtful that an effect of monomethylation ^ 
on stability might be electrostatic in origin. A structural ^ 
explanation of the difference in stability must await a more 
detailed comparison of the structures of the native and 
recombinant proteins. The spectroscopic data presented here 
would indicate that the structural differences are slight. 
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